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SYSTEM FOR IN VITRO TRANSPOSITION USING MODIFIED TN5 TRANSPOSASE 

CROSS - REFERENCE TO RELATED APPLICATION 
This patent application is a continuation-in-part of a 
patent application entitled -System for In Vitro 
Transposition, - filed March 11, 1997, for which no serial 
number has yet been accorded. Applicants have petitioned for a 
filing date of September 9, 1996 to be accorded to the parent 
application. 

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
Not applicable. 

BACKGROUND OF THE INVENTION 
The present invention relates generally to the field of 
transposable nucleic acid and, more particularly to production 
and use of a modified transposase enzyme in a system for 
introducing genetic changes to nucleic acid. 

Transposable genetic elements are DNA sequences, found in 
a wide variety of prokaryotic and eukaryotic organisms, that 
can move or transpose from one position to another position in 
a genome. In vivo, intra -chromosomal transpositions as well as 
transpositions between chromosomal and non- chromosomal genetic 
material are known. In several systems, transposition is known 
to be under the control of a transposase enzyme that is 
typically encoded by the transposable element. The genetic 
structures and transposition mechanisms of various transposable 
elements are summarized, for example, in "Transposable Genetic 
Elements- in "The Encyclopedia of Molecular Biology," Kendrew 
and Lawrence, Eds., Blackwell Science, Ltd., Oxford (1994), 
incorporated herein by reference. 

in vitro transposition systems that utilize the particular 
transposable elements of bacteriophage Mu and bacterial 
transposon TnlO have been described, by the research groups of 
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Kiyoshi Mizuuchi and Nancy Kleckner, respectively. 

The bacteriophage Mu system was first described by 
Mizuuchi, K. , "In Vitro Transposition of Bacteria Phage Mu: A 
Biochemical Approach to a Novel Replication Reaction, " 

785-794 (1983) and Craigie f R. et al "A Defined System 
for the DNA Strand-Transfer Reaction at the Initiation of 
Bacteriophage Mu Transposition: Protein and DNA Substrate 
Requirements, » P.N.A.S. U.S.A. 82:7570-7574 (1985). The DNA 
donor substrate (mini-Mu) for Mu in vitro reaction normally 
requires six Mu transposase binding sites (three of about 30 bp 
at each end) and an enhancer sequence located about 1 kb from 
the left end. The donor plasmid must be supercoiled. Proteins 
required are Mu-encoded A and B proteins and host-encoded HU 
and IHF proteins. Lavoie, B.D, and G. Chaconas, "Transposition 
of phage Mu DNA," Curr . Topics Microbiol. Immunol. 204:83-99 
(1995) . The Mu-based system is disfavored for in vitro 
transposition system applications because the Mu termini are 
complex and sophisticated and because transposition requires 
additional proteins above and beyond the transposase. 

The TnlO system was described by Morisato, D. and N. 
Kleckner, "TnlO Transposition and Circle Formation in vitro, " 
Ml 51:101-111 (1987) and by Benjamin, H. W. and N. Kleckner, 
"Excision Of TnlO from the Donor Site During Transposition 
Occurs By Flush Double-Strand Cleavages at the Transposon 
Termini," P t N,A,S, U.S.A. 89:4648-4652 (1992). The TnlO system 
involves the a supercoiled circular DNA molecule carrying the 
transposable element (or a linear DNA molecule plus E. coli IHF 
protein) . The transposable element is defined by complex 42 bp 
terminal sequences with IHF binding site adjacent to the 
inverted repeat , In fact, even longer (81 bp) ends of TnlO 
were used in reported experiments. Sakai, J. et al . , 
"Identification and Characterization of Pre-Cleavage Synaptic 
Complex that is an Early Intermediate in TnlO transposition, " 
BiM, P tO, J, 14:4374-4383 (1995). In the TnlO system, chemical 
treatment of the transposase protein is essential to support 
active transposition. In addition, the termini of the TnlO 
element limit its utility in a generalized in vitro 
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5 transposition system. 

Both the Mu- and TnlO-based in vitro transposition systems 
are further limited in that they are active only on covalently 
closed circular, supercoiled DNA targets. What is desired is a 
more broadly applicable in vitro transposition system that 
10 utilizes shorter, more well defined termini and which is active 
on target DNA of any structure (linear, relaxed circular, and 
supercoiled circular DNA) . 

BRIEF SUMMARY OF THE INVENTION 
The present invention is summarized in that an in vitro 

15 transposition system comprises a preparation of a suitably 

modified transposase of bacterial transposon Tn5, a donor DNA 
molecule that includes a transposable element, a target DNA 
molecule into which the transposable element can transpose, all 
provided in a suitable reaction buffer. 

20 The transposable element of the donor DNA molecule is 

characterized as a transposable DNA sequence of interest, the 
DNA sequence of interest being flanked at its 5'- and 3 • -ends 
by short repeat sequences that are acted upon in trans by Tn5 
transposase . 

25 The invention is further summarized in that the suitably 

modified transposase enzyme comprises two classes of 
differences from wild type Tn5 transposase, where each class 
has a separate measurable effect upon the overall transposition 
activity of the enzyme and where a greater effect is observed 
30 when both modifications are present. The suitably modified 

enzyme both (1) binds to the repeat sequences of the donor DNA 
with greater avidity than wild type Tn5 transposase ("class (1) 
mutation") and (2) is less likely than the wild type protein to 
assume an inactive multimeric form ("class (2) mutation"). A 
35 suitably modified Tn5 transposase of the present invention that 
contains both class (1) and class (2) modifications induces at 
least about 100-fold (+10%) more transposition than the wild 
type enzyme, when tested in combination in an in vivo 
conjugation assay as described by Weinreich, M.D., "Evidence 
40 that the cis Preference of the Tn5 Transposase is Caused by 
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5 Nonproductive Mul timer izat ion , n Genes and Development 8:2363- 

2374 (1994), incorporated herein by reference. Under optimal 
conditions, transposition using the modified transposase may be 
higher. A modified transposase containing only a class (1) 
mutation binds to the repeat sequences with sufficiently 

10 greater avidity than the wild type Tn5 transposase that such a 

Tn5 transposase induces about 5- to 50- fold more transposition 
than the wild type enzyme, when measured in vivo. A modified 
transposase containing only a class (2) mutation is 
sufficiently less likely than the wild type Tn5 transposase to 

15 assume the multimeric form that such a Tn5 transposase also 

induces about 5- to 50 -fold more transposition than the wild 
type enzyme, when measured in vivo. 

In another aspect, the invention is summarized in that a 
method for transposing the transposable element from the donor 

20 DNA into the target DNA in vitro includes the steps of mixing 

together the suitably modified Tn5 transposase protein, the 
donor DNA, and the target DNA in a suitable reaction buffer, 
allowing the enzyme to bind to the flanking repeat sequences of 
the donor DNA at a temperature greater than 0°C, but no higher 

25 than about 2 8°C, and then raising the temperature to 

physiological temperature (about 37°C) whereupon cleavage and 
strand transfer can occur. 

It is an object of the present invention to provide a 
useful in vitro transposition system having few structural 

30 requirements and high efficiency. 

It is another object of the present invention to provide a 
method that can be broadly applied in various ways, such as to 
create absolute defective mutants, to provide selective markers 
to target DNA, to provide portable regions of homology to a 

35 target DNA, to facilitate insertion of specialized DNA 

sequences into target DNA, to provide primer binding sites or 
tags for DNA sequencing, to facilitate production of genetic 
fusions for gene expression studies and protein domain mapping, 
as well as to bring together other desired combinations of DNA 

40 sequences (combinatorial genetics) . 

It is a feature of the present invention that the modified 
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transposase enzyme binds more tightly to DNA than does wild 
type Tn5 transposase. 

It is an advantage of the present invention that the 
modified transposase facilitates in vitro transposition 
reaction rates of at least about 100-fold higher than can be 
achieved using wild type transposase (as measured in vivo) . It 
is noted that the wild-type Tn5 transposase shows no detectable 
in vitro activity in the system of the present invention. 
Thus, while it is difficult to calculate an upper limit to the 
increase in activity, it is clear that hundreds, if not 
thousands, of colonies are observed when the products of in 
vitro transposition are assayed in vivo. 

It is another advantage of the present invention that in 
vitro transposition using this system can utilize donor DNA and 
target DNA that is circular or linear. 

It is yet another advantage of the present invention that 
in vitro transposition using this system requires no outside 
high energy source and no other protein other than the modified 
transposase . 

Other objects, features, and advantages of the present 
invention will become apparent upon consideration of the 
following detailed description. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 
Fig. 1 depicts test plasmid pRZTLl, used herein to 
demonstrate transposition in vitro of a transposable element 
located between a pair of Tn5 outside end termini. Plasmid 
pRZTLl is also shown and described in SEQ ID NO: 3. 

Fig. 2 depicts an electrophoretic analysis of plasmid 
pRZTLl before and after in vitro transposition. Data obtained 
using both circular and linear plasmid substrates are shown. 

Fig. 3 is an electrophoretic analysis of plasmid pRZTLl 
after in vitro transposition, including further analysis of the 
molecular species obtained using circular and linear plasmid 

substrates . . 

Fig. 4 shows plasmids P RZ1496, pRZ5451 and pRZTLl, which 

are detailed in the specification. 
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5 Fig. 5 shows a plot of papillae per colony over time for 

various mutant OE sequences tested in vivo against EK54/MA56 
transposase . 

Fig. 6 shows a plot of papillae per colony over time for 
various mutant OE sequences with a smaller Y-axis than is shown 
10 in Fig. 5 tested against EK54/MA56 transposase* 

Fig. 7 shows a plot of papillae per colony over time for 
various mutant OE sequences tested against MA56 TnS 
transposase. 

Fig. 8 shows in vivo transposition using two preferred 
15 mutants, tested against MA56 and EK54/MA56 transposase. 

DETAILED DESCRIPTION OF THE INVENTION 
It will be appreciated that this technique provides a 
simple, in vitro system for introducing any transposable 
element from a donor DNA into a target DNA. It is generally 

20 accepted and understood that TnS transposition requires only a 

pair of OE termini, located to either side of the transposable 
element. These OE termini are generally thought to be 18 or 19 
bases in length and are inverted repeats relative to one 
another. Johnson, R. C. , and W. S. Reznikoff, Nature 304:280 

25 (1983) , incorporated herein by reference. The TnS inverted 

repeat sequences, which are referred to as "termini" even 
though they need not be at the termini of the donor DNA 
molecule, are well known and understood. 

Apart from the need to flank the desired transposable 

30 element with standard Tn5 outside end ("OE" ) termini, few other 

requirements on either the donor DNA or the target DNA are 
envisioned. It is thought that TnS has few, if any, 
preferences for insertion sites, so it is possible to use the 
system to introduce desired sequences at random into target 

35 DNA. Therefore, it is believed that this method, employing the 

modified transposase described herein and a simple donor DNA, 
is broadly applicable to introduce changes into any target DNA, 
without regard to its nucleotide sequence. It will, thus, be 
applied to many problems of interest to those skilled in the 

40 art of molecular biology. 
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In the method, the modified transposase protein is 
combined in- a suitable reaction buffer with the donor DNA and 
the target DNA. A suitable reaction buffer permits the 
transposition reaction to occur. A preferred, but not 
necessarily optimized, buffer contains spermidine to condense 
the DNA, glutamate, and magnesium, as well as a detergent, 
which is preferably 3- [ (3-cholamidopropyl) dimethyl -ammonio] -1- 
propane sulfonate ("CHAPS"). The mixture can be incubated at a 
temperature greater than 0°C and as high as about 28°C to 
facilitate binding of the enzyme to the OE termini. Under the 
buffer conditions used by the inventors in the Examples, a 
pretreatment temperature of 30»C was not adequate. A preferred 
temperature range is between 16°C and 28°C. A most preferred 
pretreatment temperature is about 20°C. Under different buffer 
conditions, however, it may be possible to use other below- 
physiological temperatures for the binding step. After a short 
pretreatment period of time (which has not been optimized, but 
which may be as little as 30 minutes or as much as 2 hours, and 
is typically 1 hour) , the reaction mixture is diluted with 2 
volumes of a suitable reaction buffer and shifted to 
physiological conditions for several more hours (say 2-3 hours) 
to permit cleavage and strand transfer to occur. A temperature 
of 37°C. or thereabouts, is adequate. After about 3 hours, the 
rate of transposition decreases markedly. The reaction can be 
stopped by phenol -chloroform extraction and can then be 
desalted by ethanol precipitation. 

When the DNA has been purified using conventional 
purification tools, it is possible to employ simpler reaction 
conditions in the in vitro transposition method. DNA of 
sufficiently high purity can be prepared by passing the DNA 
preparation through a resin of the type now commonly used in 
the molecular biology laboratory, such as the Qiagen resin of 
the Qiagen plasmid purification kit (Catalog No. 12162) . When 
such higher quality DNA is employed, CHAPS can be omitted from 
the reaction buffer. When CHAPS is eliminated from the 
reaction buffer, the reactants need not be diluted in the 
manner described above. Also, the low temperature incubation 
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5 step noted above can be eliminated in favor of a single 

incubation for cleavage and strand transfer at physiological 
conditions. A three hour incubation at 37°C is sufficient. 

Following the reaction and subsequent extraction steps, 
transposition can be assayed by introducing the nucleic acid 

10 reaction products into suitable bacterial host cells (e.g., E. 

coli K-12 DH5a cells (recA") ; commercially available from Life 
Technologies (Gibco-BRL) ) preferably by electroporation, 
described by Dower et al . , flue, Acids » ReSt 16:6127 (1988), and 
monitoring for evidence of transposition, as is described 

15 elsewhere herein. 

Those persons skilled in the art will appreciate that 
apart from the changes noted herein, the transposition reaction 
can proceed under much the same conditions as would be found in 
an in vivo reaction. Yet, the modified transposase described 

20 herein so increases the level of transposition activity that it 

is now possible to carry out this reaction in vitro where this 
has not previously been possible. The rates of reaction are 
even greater when the modified transposase is coupled with an 
optimized buffer and temperature conditions noted herein. 

25 In another aspect, the present invention is a preparation 

of a modified Tn5 transposase enzyme that differs from wild 
type Tn5 transposase in that it (1) binds to the repeat 
sequences of the donor DNA with greater avidity than wild type 
Tn5 transposase and (2) is less likely than the wild type 

30 protein to assume an inactive multimeric form. An enzyme 

having these requirements can be obtained from a bacterial host 
cell containing an expressible gene for the modified enzyme 
that is under the control of a promoter active in the host 
cell. Genetic material that encodes the modified Tn5 

35 transposase can be introduced (e.g., by electroporation) into 

suitable bacterial host cells capable of supporting expression 
of the genetic material. Known methods for overproducing and 
preparing other Tn5 transposase mutants are suitably employed. 
For example, Weinreich, M. D. , et al . , supra, describes a 

40 suitable method for overproducing a Tn5 transposase. A second 

method for purifying Tn5 transposase was described in de la 



•8- 



SUBSTITUTE SHEET (RULE 26) 



„ M PCT/US97/15941 
WO 98/10077 

Cruz, N. B., et al . . "Characterization of the TnS Transposase 
and inhibitor Proteins: A Model for the Inhibition of 
Transposition," ,7. B«ct . 175:6932-6938 (1993), also 
incorporated herein by reference. It is noted that induction 
can be carried out at temperatures below 37°C, which is the 
temperature used by de la Cruz, et al . Temperatures at least in 
the range of 33 to 37°C are suitable. The inventors have 
determined that the method for preparing the modified 
transposase of the present invention is not critical to success 
of the method, as various preparation strategies have been used 

with equal success . 

Alternatively, the protein can be chemically synthesized, 
in a manner known to the art, using the amino acid sequence 
attached hereto as SEQ ID NO: 2 as a guide. It is also possible 
to prepare a genetic construct that encodes the modified 
protein (and associated transcription and translation signals) 
by using standard recombinant DNA methods familiar to molecular 
biologists. The genetic material useful for preparing such 
constructs can be obtained from existing TnS constructs, or can 
be prepared using known methods for introducing mutations xnto 
genetic material (e.g., random mutagenesis PCR or site-directed 
mutagenesis) or some combination of both methods. The genetic 
sequence that encodes the protein shown in SEQ ID NO: 2 is set 

forth in SEQ ID NO:l. 

The nucleic acid and amino acid sequence of wild type TnS 

transposase are known and published. N. C.B.I. Accession Number 

U00004 L19385, incorporated herein by reference. 

In a preferred embodiment, the improved avidity of the 

modified transposase for the repeat sequences for OE termini 
(class (1) mutation) can be achieved by providing a lysine 
residue at amino acid 54, which is glutamic acid in wild type 
TnS transposase. The mutation strongly alters the preference 
of the transposase for OE termini, as opposed to inside end 
("IE") termini. The higher binding of this mutation, known as 
EK54, to OE termini results in a transposition rate that is 
about 10-fold higher than is seen with wild type transposase. 
A similar change at position 54 to valine (mutant EV54) also 
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5 results in somewhat increased binding/transposition for OE 

termini, as does a threonine-to-proline change at position 47 
(mutant TP47; about 10 -fold higher) . It is believed that 
other, comparable transposase mutations (in one or more amino 
acids) that increase binding avidity for OE termini may also be 

10 obtained which would function as well or better in the in vitro 

assay described herein. 

One of ordinary skill will also appreciate that changes to 
the nucleotide sequences of the short repeat sequences of the 
donor DNA may coordinate with other mutation (s) in or near the 

15 binding region of the transposase enzyme to achieve the same 

increased binding effect, and the resulting 5- to 50 -fold 
increase in transposition rate. Thus, while the applicants 
have exemplified one case of a mutation that improves binding 
of the exemplified transposase, it will be understood that 

20 other mutations in the transposase, or in the short repeat 

sequences, or in both, will also yield transposases that fall 
within the scope and spirit of the present invention. A 
suitable method for determining the relative avidity for Tn5 OE 
termini has been published by Jilk, R. A. , et al . , "The 

25 Organization of the Outside end of Transposon Tn5, " J . Bact . 

178:1671-79 (1996) . 

The transposase of the present invention is also less 
likely than the wild type protein to assume an inactive 
multimeric form. In the preferred embodiment, that class (2) 

3 0 mutation from wild type can be achieved by modifying amino acid 

372 (leucine) of wild type Tn5 transposase to a proline (and, 
likewise by modifying the corresponding DNA to encode proline) . 
This mutation, referred to as LP372, has previously been 
characterized as a mutation in the dimerization region of the 
35 transposase. Weinreich, et al., supra. It was noted by 

Weinreich et al . that this mutation at position 372 maps to a 
region shown previously to be critical for interaction with an 
inhibitor of Tn5 transposition. The inhibitor is a protein 
encoded by the same gene that encodes the transposase, but 

4 0 which is truncated at the N- terminal end of the protein, 

relative to the transposase. The approach of Weinreich et al . 
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for determining the extent to which multimers are formed is 
suitable for determining whether a mutation falls wxthxn the 
scope of this element. 

It is thought that when wild type Tn5 transposase 
multimerizes, its activity in trans is reduced. Presumably, a 
mutation in the dimerization region reduces or prevents 
multimerization, thereby reducing inhibitory activity and 
leading to levels of transposition 5- to 50 -fold higher than 
are seen with the wild type transposase. The LP372 mutation 
achieves about 10-fold higher transposition levels than wxld 
15 type. Likewise, other mutations (including mutations at a one 
or m ore amino acid) that reduce the ability of the transposase 
to multimerize would also function in the same manner as the 
single mutation at position 372, and would also be suitable xn 
a transposase of the present invention. It may also be 
possible to reduce the ability of a Tn5 transposase to 
multimerize without altering the wild type sequence in the so- 
called dimerization region, for example by adding into the 
system another protein or non-protein agent that blocks the 
dimerization site. Alternatively, the dimerization region 
could be removed entirely from the transposase protein . 

As was noted above, the inhibitor protein, encoded in 
partially overlapping sequence with the transposase, can 
interfere with transposase activity. As such, it is desxred 
that the amount of inhibitor protein be reduced over the amount 
observed in wild type in vivo. For the present assay, the 
transposase is used in purified form, and it may be possxble to 
separate the transposase from the inhibitor (for example, 
according to differences in size) before use. However, xt xs 
also possible to genetically eliminate the possibilxty of 
having any contaminating inhibitor protein present by removxng 
its start codon from the gene that encodes the transposase. 

An AUG in the wild type Tn5 transposase gene that encodes 
methionine at transposase amino acid 56 is the first codon of 
the inhibitor protein. However, it has already been shown that 
replacement of the methionine at position 56 has no apparent 
effect upon the transposase activity, but at the same time 
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prevents translation of the inhibitor protein, thus resulting 
in a somewhat higher transposition rate. Weigand, T. W. and W. 
S. Reznikoff, "Characterization of Two Hypertransposing Tn5 
Mutants, » J, Bact . 174:1229-1239 (1992), incorporated herein by 
reference. In particular, the present inventors have replaced 
the methionine with an alanine in the preferred embodiment (and 
have replaced the methionine-encoding AUG codon with an 
alanine -encoding GCC) . A preferred transposase of the present 
invention therefore includes an amino acid other than 
methionine at amino acid position 56, although this change can 
be considered merely technically advantageous (since it ensures 
the absence of the inhibitor from the in vitro system) and not 
essential to the invention (since other means can be used to 
eliminate the inhibitor protein from the in vitro system) . 

The most preferred transposase amino acid sequence known 
to the inventors differs from the wild type at amino acid 
positions 54, 56, and 372. The mutations at positions 54 and 
372 separately contribute approximately a 10-fold increase to 
the rate of transposition reaction in vivo. When the mutations 
are combined using standard recombinant techniques into a 
single molecule containing both classes of mutations, reaction 
rates of at least about 100 -fold higher than can be achieved 
using wild type transposase are observed when the products of 
the in vitro system are tested in vivo. The mutation at 
position 56 does not directly affect the transposase activity. 

Other mutants from wild type that are contemplated to be 
likely to contribute to high transposase activity in vitro 
include, but are not limited to glutaminic acid- to- lysine at 
position 110, and glutamic acid to lysine at position 345. 

It is, of course, understood that other changes apart from 
these noted positions can be made to the modified transposase 
(or to a construct encoding the modified transposase) without 
adversely affecting the transposase activity. For example, it 
is well understood that a construct encoding such a transposase 
could include changes in the third position of codons such that 
the encoded amino acid does not differ from that described 
herein. In addition, certain codon changes have little or no 
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functional effect upon the transposition activity ot the 
encoded protein. Finally, other changes may be introduced 
which provide yet higher transposition activity in the encoded 
protein. It is also specifically envisioned that combinations 
of mutations can be combined to encode a modified transposase 
having even higher transposition activity than has been 
exemplified herein. All of these changes are within the scope 
of the present invention. It . is noted, however, that a 
modified transposase containing the EK110 and EK345 mutations 
(both described by Weigand and Reznikoff , supra, had lower 
transposase activity than a transposase containing either 

mutation alone . 

After the enzyme is prepared and purified, as described 
supra, it can be used in the in vitro transposition reaction 
described above to introduce any desired transposable element 
from a donor DNA into a target DNA. The donor DNA can be 
circular or can be linear. If the donor DNA is linear, it is 
preferred that the repeat sequences flanking the transposable 
element should not be at the termini of the linear fragment but 
should rather include some DNA upstream and downstream from the 
region flanked by the repeat sequences. 

As was noted above, Tn5 transposition requires a pair of 
eighteen or nineteen base long termini . The wild type Tn5 
outside end (OE) sequence ( 5 ' - CTGACTCTTATACACAAGT - 3 • ) (SEQ ID 
NO- 7) has been described. It has been discovered that a 
transposase-catalyzed in vitro transposition frequency at least 
as high as that of wild type OE is achieved if the termini in a 
construct include bases ATA at positions 10, 11, and 12, 
respectively, as well as the nucleotides in common between wild 
type OE and IE (e.g., at positions 1-3, 5-9, 13, 14, 16, and 
optionally 19). The nucleotides at positions 4, 15, 17, and 18 
can correspond to the nucleotides found at those positions in 
either wild type OE or wild type IE. It is noted that the 
transposition frequency can be enhanced over that of wild type 
OE if the nucleotide at position 4 is a T. The importance of 
these particular bases to transposition frequency has not 
previously been identified. 
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5 It is noted that these changes are not intended to 

encompass every desirable modification to OE. As is described 
elsewhere herein, these attributes of acceptable termini 
modifications were identified by screening mutants having 
randomized differences between IE and OE termini. While the 

10 presence in the termini of certain nucleotides is shown herein 

to be advantageous, other desirable terminal sequences may yet 
be obtained by screening a larger array of degenerate mutants 
that include changes at positions other than those tested 
herein as well as mutants containing nucleotides not tested in 

15 the described screening. In addition, it is clear to one 

skilled in the art that if a different transposase is used, it 
may still be possible to select other variant termini, more 
compatible with that particular transposase. 

Among the mutants shown to be desirable and within the 

20 scope of the invention are two hyperactive mutant OE sequences 

that were identified in vivo. Although presented here as 
single stranded sequences, in fact, the wild type and mutant OE 
sequences include complementary second strands. The first 
hyperactive mutant, 5 1 -CTGTCTCTTATACACATCT-3 1 (SEQ ID NO: 8), 

25 differs from the wild type OE sequence at positions 4, 17, and 

18, counting from the 5' end, but retains ATA at positions 10- 
12. The second, 5 1 -CTGTCTCTTATACAGATCT-3 1 (SEQ ID NO: 9), 
differs from the wild type OE sequence at positions 4, 15 , 17, 
and 18, but also retains ATA at positions 10-12. These two 

30 hyperactive mutant OE sequences differ from one another only at 

position 15, where either G or C is present. OE-like activity 
(or higher activity) is observed in a mutant sequence when it 
contains ATA at positions 10, 11 and 12. It may be possible to 
reduce the length of the OE sequence from 19 to 18 nucleotide 

35 pairs with little or no effect. 

When one of the identified hyperactive mutant OE sequences 
flanks a substrate DNA, the in vivo transposition frequency of 
EK54/MA56 transposase is increased approximately 40-60 fold 
over the frequency that is observed when wild type OE termini 

40 flank the transposable DNA. The EK54/MA56 transposase is 

already known to have an in vivo transposition frequency 
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approximately an 8-10 fold higher than wild type transposase, 
using wild type OE termini. Tn5 transposase having the 
EK54/MA56 mutation is known to bind with greater avidity to OE 
and with lesser avidity to the Tn5 inside ends (IE) than wild 
type transposase . 

A suitable mutant terminus in a construct for use in the 
assays of the present invention is characterized biologically 
as yielding more papillae per colony in a comparable time, say 
68 hours, than is observed in colonies harboring wild type OE 
in a comparable plasmid. Wild type OE can yield about 100 
papillae per colony when measured 68 hours after plating in a 
papulation assay using EK54/MA56 transposase, as is described 
elsewhere herein. A preferred mutant would yield between about 
200 and 3000 papillae per colony, and a more preferred mutant 
between about 1000 and 3000 papillae per colony, when measured 
in the same assay and time frame. A most preferred mutant 
would yield between about 2000 and 3000 papillae per colony 
when assayed under the same conditions. Papulation levels may 
be even greater than 3000 per colony, although it is difficult 
to quantitate at such levels. 

Transposition frequency is also substantially enhanced in 
the in vitro transposition assay of the present invention when 
substrate DNA is flanked by a preferred mutant OE sequence and 
a most preferred mutant transposase (comprising EK54/MA56/LP372 
mutations) is used. Under those conditions, essentially all of 
the substrate DNA is converted into transposition products. 

The rate of in vitro transposition observed using the 
hyperactive termini is sufficiently high that, in the 
experience of the inventors, there is no need to select for 
transposition events. All colonies selected at random after 
transformation for further study have shown evidence of 
transposition events. 

This advance can represent a significant savings in time 
and laboratory effort. For example, it is particularly 
advantageous to be able to improve in vitro transposition 
frequency by modifying DNA rather than by modifying the 
transposase because as transposase activity increases in host 
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cells, there is an increased likelihood that cells containing 
the transposase are killed during growth as a result of 
aberrant DNA transpositions. In contrast, DNA of interest 
containing the modified OE termini can be grown in sources 
completely separate from the transposase, thus not putting the 
host cells at risk. 

Without intending to limit the scope of this aspect of 
this invention, it is apparent that the tested hyperactive 
termini do not bind with greater avidity to the transposase 
than do wild type OE termini. Thus, the higher transposition 
frequency brought about by the hyperactive termini is not due 
to enhanced binding to transposase. 

The transposable element between the termini can include 
any desired nucleotide sequence. The length of the 
transposable element between the termini should be at least 
about 50 base pairs, although smaller inserts may work. No 
upper limit to the insert size is known. However, it is known 
that a donor DNA portion of about 300 nucleotides in length can 
function well. By way of non-limiting examples, the 
transposable element can include a coding region that encodes a 
detectable or selectable protein, with or without associated 
regulatory elements such as promoter, terminator, or the like. 

If the element includes such a detectable or selectable 
coding region without a promoter, it will be possible to 
identify and map promoters in the target DNA that are uncovered 
by transposition of the coding region into a position 
downstream thereof, followed by analysis of the nucleic acid 
sequences upstream from the transposition site. 

Likewise, the element can include a primer binding site 
that can be transposed into the target DNA, to facilitate 
sequencing methods or other methods that rely upon the use of 
primers distributed throughout the target genetic material. 
Similarly, the method can be used to introduce a desired 
restriction enzyme site or polylinker, or a site suitable for 
another type of recombination, such as a cre-lox, into the 
target . 

The invention can be better understood upon consideration 
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of the following examples which are intended to be exemplary 
and not limiting on the invention. 

EXAMPLES 

To obtain the transposase modified at position 54, the 
first third of the coding region from an existing DNA clone 
that encodes the Tn5 transposase but not the inhibitor protein 
(MA56) was mutagenized according to known methods and DNA 
fragments containing the mutagenized portion were cloned to 
produce a library of plasmid clones containing a full length 
transposase gene. The clones making up the library were 
transformed into E. coli K-12 strain MDW320 bacteria which were 
plated and grown into colonies. Transposable elements provided 
in the bacteria on a separate plasmid contained a defective 
lacZ gene. The separate plasmid, P OXgen386, was described by 
Weinreich, M. et al . , "A functional analysis of the Tn5 
Transposase: Identification of Domains Required for DNA Binding 
and Dimerization," <T MoT. TUol , 241:166-177 (1993), 
incorporated herein by reference. Colonies having elevated 
transposase activity were selected by screening for blue (LacZ) 
spots in white colonies grown in the presence of X-gal. This 
papulation assay was described by Weinreich, et al . (1993), 
supra. The 5 • -most third of Tn5 transposase genes from such 
colonies were sequenced to determine whether a mutation was 
responsible for the increase in transposase activity. It was 
determined that a mutation at position 54 to lysine (K) 
correlated well with the increase in transposase activity. 
Plasmid PRZ5412-EK54 contains lysine at position 54 as well as 
the described alanine at position 56. 

The fragment containing the LP372 mutation was isolated 
from PRZ4870 (Weinreich et al (1994)) using restriction enzymes 
Nhel and Bglll, and were ligated into Nhel-Bglll cut P RZ5412- 
EK54 to form a recombinant gene having the mutations at 
positions 54, 56 and 372, as described herein and shown in SEQ 
ID HO-1 The gene was tested and shown to have at least about 
a one hundred fold increase in activity relative to wild type 
TnS transposase. Each of the mutants at positions 54 and 372 
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alone had about a 10 -fold increase in transposase activity. 

The modified transposase protein encoded by the triple - 
mutant recombinant gene was transferred into commercial T7 
expression vector pET-21D (commercially available from Novagen, 
Madison, WI) by inserting a BspHI/Sall fragment into Nhol/Xhol 
fragment of the pET-21D vector. This cloning puts the modified 
transposase gene under the control of the T7 promoter, rather 
than the natural promoter of the transposase gene. The gene 
product was overproduced in BL21 (DE3 ) pLysS bacterial host 
cells, which do not contain the binding site for the enzyme, by 
specific induction in a fermentation process after cell growth 
is complete. Studier, F. W. , et al . , "Use of T7 RNA 

Polymerase to Direct Expression of Cloned Genes, " Methods 
gnsymol . 185:60-89 (1990)). The transposase was partially 
purified using the method of de la Cruz, modified by inducing 
overproduction at 33 or 37°C. After purification, the enzyme 
preparation was stored at -70°C in a storage buffer (10% 
glycerol, 0 . 7M NaCl, 20 mM Tris-HCl, pH 7.5, 0.1% Triton-XlOO 
and 10 mM CHAPS) until use. This storage buffer is to be 
considered exemplary and not optimized. 

A single plasmid (pRZTLl, Fig. 1) was constructed to serve 
as both donor and target DNA in this Example. The complete 
sequence of the pRZTLl plasmid DNA is shown and described in 
SEQ ID NO: 3. Plasmid pRZTLl contains two Tn5 19 base pair OE 
termini in inverted orientation to each other. Immediately 
adjacent to one OE sequence is a gene that would encode 
tetracycline resistance, but for the lack of an upstream 
promoter. However, the gene is expressed if the tetracycline 
resistance gene is placed downstream of a transcribed region 
(e.g., under the control of the promoter that promotes 
transcription of the chloramphenicol resistance gene also 
present on pRZTLl) . Thus, the test plasmid pRZTLl can be 
assayed in vivo after the in vitro reaction to confirm that 
transposition has occurred. The plasmid pRZTLl also includes 
an origin of replication in the transposable element, which 
ensures that all transposition products are plasmids that can 
replicate after introduction in host cells • 
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The following components were used in typical 20^1 in 
vitro transposition reactions: 

Modified transposase: 2 n\ (approximately 0.1 M9 
enzyme/MD in storage buffer (10% glycerol, 0.7M NaCl, 20 
mM Tris-HCl, pH 7.5, 0.1% Triton-XlOO and 10 mM CHAPS) 

Donor/Target DNA: 18 fxl (approximately 1-2 fxg) in 
reaction buffer (at final reaction concentrations of 0.1 M 
potassium glutamate, 25 mM Tris acetate, pH 7.5, 10 mM 
Mg 2+ -acetate, 50 fxg/ml BSA, 0 . 5 mM (J-mercaptoethanol . 2 mM 
spermidine, 100 /xg/ml tRNA) . 

At 20°C, the transposase was combined with pRZTLl DNA for 
about 60 minutes. Then, the reaction volume was increased by 
adding two volumes of reaction buffer and the temperature was 
raised to 37°C for 2-3 hours whereupon cleavage and strand 

transfer occurred. 

Efficient in vitro transposition was shown to have 
occurred by in vivo and by in vitro methods. In vivo, many 
tetracycline-resistant colonies were observed after 
transferring the nucleic acid product of the reaction into DH5a 
bacterial cells. As noted, tetracycline resistance can only 
arise in this system if the transposable element is transposed 
downstream from an active promoter elsewhere on the plasmid. A 
typical transposition frequency was 0.1% of cells that received 
plasmid DNA, as determined by counting chloramphenicol 
resistant colonies. However, this number underestimates the 
total transposition event frequency because the detection 
system limits the target to 1/16 of the total. 

Moreover, in vitro electrophoretic (1% agarose) and DNA 
sequencing analyses of DNA isolated from purified colonies 
revealed products of true transposition events, including both 
intramolecular and intermodular events. Results of typical 
reactions using circular plasmid pRZTLl substrates are shown m 
Lanes 4 & 5. Lane 6 of Fig. 2 shows the results obtained usxng 
linear plasmid pRZTLl substrates. 
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The bands were revealed on 1% agarose gels by staining 
with SYBR Green (FMC BioProducts) and were scanned on a 
Fluorimager SI (Molecular Dynamics) . In Figure 2, lane 1 shows 
relaxed circle, linear, and closed circle versions of pRZTLl. 
Lanes 2 and 3 show intramolecular and intermolecular 
transposition products after in vitro transposition of pRZTLl, 
respectively. The products were purified from electroporated 
DH5a cells and were proven by size and sequence analysis to be 
genuine transposition products. Lanes 4 and 5 represent 
products of two independent in vitro reactions using a mixture 
of closed and relaxed circular test plasmid substrates . In 
lane 6, linear pRZTLl (Xhol-cut) was the reaction substrate. 
Lane 7 includes a BstEII digest of lambda DNA as a molecular 
weight standard. 

Fig. 3 reproduces lanes 4, 5, and 6 of Fig. 2 and shows an 
analysis of various products, based upon secondary restriction 
digest experiments and re-electroporation and DNA sequencing. 
The released donor DNA corresponds to the fragment of pRZTLl 
that contains the kanamycin resistance gene between the two OE 
sequences, or, in the case of the linear substrate, the OE-XhoI 
fragment. Intermolecular transposition products can be seen 
only as relaxed DNA circles. Intramolecular transposition 
products are seen as a ladder, which results from conversion of 
the initial superhelicity of the substrate into DNA knots. The 
reaction is efficient enough to achieve double transposition 
events that are a combination of inter- and intramolecular 
events . 

A preliminary investigation was made into the nature of 
the termini involved in a transposition reaction. Wild type 
Tn5 OE and IE sequences were compared and an effort was 
undertaken to randomize the nucleotides at each of the seven 
positions of difference. A population of oligonucleotides 
degenerate at each position of difference was created. Thus, 
individual oligonucleotides in the population randomly included 
either the nucleotide of the wild type OE or the wild type IE 
sequence. In this scheme, 2 7 (128) distinct oligonucleotides 
were synthesized using conventional tools. These 



-20- 



SUBSTITUTE SHEET (RULE 26) 



WO 98/10077 PCT/US97/15941 

oligonucleotides having sequence characteristics of both OE and 
IE are referred to herein as OE/IE-like sequences. To avoid 
nomenclature issues that arise because the oligonucleotides are 
intermediate between OE and IE wild type sequences, the 
applicants herein note that selected oligonucleotide sequences 
are compared to the wild type OE rather than to wild type IE, 
unless specifically noted. It will be appreciated by one 
skilled in the art that if IE is selected as the reference 
point, the differences are identical but are identified 
differently. 

The following depicts the positions (x) that were varied 
in this mutant production scheme. WT OE is shown also at SEQ 
ID NO: 7, WT IE at SEQ ID NO: 10. 

5 ' -CTGACTCTTATACACAAGT-3 ' (WT OE) 

x xxx x xx (positions of difference) 

5 ' - CTGTCTCTTGATCAGATCT - 3 • ( WT IE ) 

in addition to the degenerate OE/IE-like sequences, the 
37- base long synthetic oligonucleotides also included terminal 
SphI and JCpnl restriction enzyme recognition and cleavage sites 
for convenient cloning of the degenerate oligonucleotides into 
plasmid vectors. Thus, a library of randomized termini was 
created from population of 2 7 (128) types of degenerate 
oligonucleotides . 

Fig. 4 shows P RZ1496, the complete sequence of which is 
presented as SEQ ID NO: 11. The following features are noted in 

the sequence: 

Future Pos i tion 

WT OE 94-112 

LacZ coding 135-3137 

LacY coding 3199-4486 

LacA coding 4553-6295 

tet r coding 6669-9442 a 

transposase coding 10683-12111 (Comp. Strand) 

Cassette IE 12184-12225 

colEl sequence 127732-19182 

The IE cassette shown in Fig. 4 was excised using SphI and 
Kpnl and was replaced, using standard cleavage and ligation 
methods, by the synthetic termini cassettes comprising OE/IE- 
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like portions. Between the fixed wild type OE sequence and the 
OE/IE-Iike cloned sequence, plasmid pRZ14 96 comprises a gene 
whose activity can be detected, namely LacZYA, as well as a 
selectable marker gene, tet r . The LacZ gene is defective in 
that it lacks suitable transcription and translation initiation 
signals. The LacZ gene is transcribed and translated only when 
it is transposed into a position downstream from such signals. 

The resulting clones were transformed using 
electroporation into danr, LacZ" bacterial cells, in this case 
JCMlOl/pOXgen cells which were grown at 37°C in LB medium under 
standard conditions. A dam- strain is preferred because dam 
methylation can inhibit IE utilization and wild type IE 
sequences include two dam methylation sites. A danr strain 
eliminates dam methylation as a consideration in assessing 
transposition activity. The Tet r cells selected were LacZ"; 
transposition-activated Lac expression was readily detectable 
against a negative background. pOXgen is a non-essential F 
factor derivative that need not be provided in the host cells. 

In some experiments, the EK54/MA56 transposase was encoded 
directly by the transformed pRZ14 96 plasmid. In other 
experiments, the pRZ14 96 plasmid was modified by deleting a 
unique Hindlll/EagI fragment (nucleotides 9112-12083) from the 
plasmid (see Fig. 4) to prevent transposase production. In the 
latter experiments, the host cells were co- trans formed with the 
Hindlll/EagI -deleted plasmid, termed pRZ5451 (Fig. 4), and with 
an EK54/MA56 transposase-encoding chloramphenicol -resistant 
plasmid. In some experiments, a comparable plasmid encoding a 
wild type Tn5 transposase was used for comparison. 

Transposition frequency was assessed by a papulation 
assay that measured the number of blue spots (Lac producing 
cells or "papillae") in an otherwise white colony. Transformed 
cells were plated (approx. 50 colonies per plate) on Glucose 
minimal Miller medium (Miller, J., Experime nts in Molecular 
Genetic*?, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 
(1972)) containing 0.3% casamino acids, 5-bromo-4-chloro-3- 
indolyl-p-D-galactoside (40 £*g/ml) and phenyl- |5-D-galactoside 
(0.05%). The medium contained tetracycline (15 /ug/ml) and. 
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where needed, chloramphenicol (20 ng/ml) . Colonies that 
survived the selection were evaluated for transposition 
frequency in vivo. Although colonies exhibiting superior 
papulation were readily apparent to the naked eye, the number 
of blue spots per colony were determined over a period of 
several days (approximately 90 hours post -plating) . 

To show that the high-papillation phenotype was conferred 
by the end mutations in the plasmids, colonies were re-streaked 
if they appeared to have papulation levels higher than was 
observed when wild type IE was included on the plasmid. 
Colonies picked from the streaked culture plates were 
themselves picked and cultured. DNA was obtained and purified 
from the cultured cells, using standard protocols, and was 
transformed again into "clean" JCMlOl/pOXgen cells. 
Papulation levels were again compared with wild type IE- 
containing plasmids in the above-noted assays, and consistent 
results were observed. 

To obtain DNA for sequencing of the inserted 
oligonucleotide, cultures were grown from white portions of 117 
hyperpapillating colonies, and DNA was prepared from each 
colony using standard DNA miniprep methods. The DNA sequence 
of the OE/IE-like portion of 117 clones was determined (42 from 
transformations using pRZ1496 as the cloning vehicle; 75 from 
transformations using pRZ545l as the cloning vehicle) . Only 29 
unique mutants were observed. Many mutants were isolated 
multiple times. All mutants that showed the highest 
papulation frequencies contain OE-derived bases at positions 
10, 11, and 12. When the OE-like bases at these positions were 
maintained, it was impossible to measure the effect on 
transposition of other changes, since the papulation level was 
already extremely high. 

One thousand five hundred seventy five colonies were 
screened as described above. The likelihood that all 128 
possible mutant sequences were screened was greater than 95%. 
Thus, it is unlikely that other termini that contribute to a 
greater transformation frequency will be obtained using the 
tested transposase . 
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Tables I and II report the qualitative papulation level 
of mutant constructs carrying the indicated hybrid end 
sequences or the wild type OE or IE end sequences. In the 
tables, the sequence at each position of the terminus 
corresponds to wild type IE unless otherwise noted. The 
applicants intend that, while the sequences are presented in 
shorthand notation, one of ordinary skill can readily determine 
the complete 19 base pair sequence of every presented mutant, 
and this specification is to be read to include all such 
complete sequences. Table I includes data from trials where 
the EK54 transposase was provided in trans; Table II, from 
those trials where the EK54 transposase was provided in cis. 
Although a transposase provided in cis is more active in 
absolute terms than a transposase provided in trans, the cis or 
trans source of the transposase does not alter the relative in 
vivo transposition frequencies of the tested termini. 

Tables I and II show that every mutant that retains ATA at 
positions 10, 11, and 12, respectively, had an activity 
comparable to, or higher than, wild type OE, regardless of 
whether the wild type OE activity was medium (Table I, trans) 
or high (Table II, cis) . Moreover, whenever that three-base 
sequence in a mutant was not ATA, the mutant exhibited lower 
papulation activity than wild type OE. It was also noted that 
papulation is at least comparable to, and tends to be 
significantly higher than, wild type OE when position 4 is a T. 

Quantitative analysis of papulation levels was difficult, 
beyond the comparative levels shown (very low, low, medium, 
medium high, and high) . However, one skilled in the art can 
readily note the papulation level of OE and can recognize 
those colonies having comparable or higher levels. It is 
helpful to observe the papillae with magnification. 

The number of observed papillae increased over time, as is 
shown in Figs 5-7 which roughly quantitate the papulation 
observed in cells transformed separately with 9 clones 
containing either distinct synthetic termini cassettes or wild 
type OE or IE termini. In these 3 figures, each mutant is 
identified by its differences from the wild type IE sequence. 
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Note that, among the tested mutants, only mutant 10A/11T/12A 
had a higher transposition papulation level than wild type OE. 
That mutant, which would be called mutant 4/15/17/18 when OE is 
the reference sequence) was the only mutant of those shown in 
Figs. 5-7 that retained the nucleotides ATA at positions 10, 
11, and 12. Figs. 5 (y-axis: 0 - 1500 papillae) and 6 (y-axis: 
0 - 250 papillae) show papulation using various mutants plus 
IE and OE controls and the EK54/MA56 enzyme. Fig. 7 (y-axis: 0 
- 250 papillae) , shows papulation when the same mutant 
sequences were tested against the wild type (more properly, 
MA56) transposase. The 10A/11T/12A mutant (SEQ ID NO: 9) 
yielded significantly more papillae (approximately 3000) in a 
shorter time (68 hours) with ED54/MA56 transposase than was 
observed even after 90 hours with the WT OE (approximately 
1500) . A single OE-like nucleotide at position 15 on an IE- 
like background also increased papulation frequency. 

In vivo transposition frequency was also quantitated in a 
tetracycline-resistance assay using two sequences having high 
levels of hyperpapillation. These sequences were 5'- 
CTGTCTCTTATACACATCT-3 1 (SEQ ID NO: 8), which differs from the 
wild type OE sequence at positions 4, 17, and 18, counting from 
the 5' end, and 5 ' -CTGTCTCTTATACAGATCT-3 1 (SEQ ID NO: 9), which 
differs from the wild type OE at positions 4, 15, 17, and 18. 
These sequences are considered the preferred mutant termini in 
an assay using a transposase that contains EK54/MA56 or a 
transposase that contains MA56. Each sequence was separately 
engineered into pRZTLl in place of the plasmid's two wild type 
OE sequences. A PCR-amplif ied fragment containing the desired 
ends flanking the kanamycin resistance gene was readily cloned 
into the large HindlH fragment of pRZTLl. The resulting 
plasmids are identical to pRZTLl except at the indicated 
termini. For comparison, pRZTLl and a derivative of pRZTLl 
containing two wild type IE sequences were also tested. In the 
assay, JCMlOl/pOXgen cells were co- transformed with a test 
plasmid (pRZTLl or derivative) and a high copy number amp r 
plasmid that encoded either the EK54/MA56 transposase or wild 
type (MA56) transposase. The host cells become tetracycline 
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resistant only when a transposition event brings the Tet r gene 
into downstream proximity with a suitable transcriptional 
promoter elsewhere on a plasmid or on the chromosome. The 
total number of cells that received the test plasmids was 
determined by counting chloramphenicol resistant, ampicillin 
resistant colonies. Transposition frequency was calculated by 
taking the ratio of tet r /cam r amp r colonies. Approximately 40 to 
60 fold increase over wild type OE in in vivo transposition was 
observed when using either of the mutant termini and EK54/MA56 
transposase. Of the two preferred mutant termini, the one 
containing mutations at three positions relative to the wild 
type OE sequence yielded a higher increase. 

As is shown in Fig. 8, which plots the tested plasmid 
against the transposition frequency (x 10~ 8 ) , little 
transposition was seen when the test plasmid included two IE 
termini. Somewhat higher transposition was observed when the 
test plasmid included two OE termini, particularly when the 
EK54/MA56 transposase was employed. In striking contrast, the 
combination of the EK54/MA56 transposase with either of the 
preferred selected ends (containing OE-like bases only at 
positions 10, 11, and 12, or positions 10, 11, 12, and 15) 
yielded a great increase in in vivo transposition over wild 
type OE termini. 

The preferred hyperactive mutant terminus having the most 
preferred synthetic terminus sequence 5 1 -CTGTCTCTTATACACATCT-3 1 
(SEQ ID NO: 8) was provided in place of both WT OE termini in 
pRZTLl (Fig. 4) and was tested in the in vitro transposition 
assay of the present invention using the triple mutant 
transposase described herein. This mutant terminus was chosen 
for further in vitro analysis because its transposition 
frequency was higher than for the second preferred synthetic 
terminus and because it has no dam methylation sites, so dam 
methylation no longer affects transposition frequency. In 
contrast the 4/15/17/18 mutant does have a dam methylation 
site . 

In a preliminary experiment, CHAPS was eliminated from the 
reaction, but the pre -incubation step was used. The reaction 
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was pre-incubated for 1 hour at 20°C, then diluted two times, 
and then incubated for 3 hours at 37°C. About 0.5 W of DNA 
and 0.4/zg of transposase was used. The transposition products 
were observed on a gel. With the mutant termini, very little 
of the initial DNA was observed. Numerous bands representing 
primary and secondary transposition reaction products were 
observed. The reaction mixtures were transformed into DH5a 
cells and were plated on chloramphenicol-, tetracycline-, or 
kanamycin- containing plates. 

Six hundred forty chloramphenicol -resistant colonies were 
observed. Although these could represent unreacted plasmid, 
all such colonies tested (n=12) were sensitive to kanamycin, 
which indicates a loss of donor backbone DNA. All twelve 
colonies also included plasmids of varied size; 9 of the 12 
were characterized as deletion- inversions, the remaining 3 were 
simple deletions. Seventy nine tetracycline-resistant 
colonies were observed, which indicated an activation of the 
tet r gene by transposition. 

Eleven kanamycin resistant colonies were observed. This 
indicated a low percentage of remaining plasmids carrying the 
donor backbone DNA. 

In a second, similar test, about 1 m of plasmid DNA and 
0.2 /ig transposase were used. In this test, the reaction was 
incubated without CHAPS at 37°C for 3 hours without pre- 
incubation or dilution. Some initial DNA was observed in the 
gel after the 3 hour reaction. After overnight incubation, 
only transposition products were observed. 

The 3 hour reaction products were transformed into DH5a 
cells and plated as described. About 50% of the 
chloramphenicol resistant colonies were sensitive to kanamycin 
and were presumably transposition products. 

The invention is not intended to be limited to the 
foregoing examples, but to encompass all such modifications and 
variations as come within the scope of the appended claims. 
It is envisioned that, in addition to the uses specifically 
noted herein, other applications will be apparent to the 
skilled molecular biologist. In particular, methods for 
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introducing desired mutations into prokaryotic or eukaryotic 
DNA are very desirable. For example, at present it is 
difficult to knock out a functional eukaryotic gene by 
homologous recombination with an inactive version of the gene 
that resides on a plasmid. The difficulty arises from the need 
to flank the gene on the plasmid with extensive upstream and 
downstream sequences. Using this system, however, an 
inactivating transposable element containing a selectable 
marker gene (e.g., neo) can be introduced in vitro into a 
plasmid that contains the gene that one desires to inactivate. 
After transposition, the products can be introduced into 
suitable host cells. Using standard selection means, one can 
recover only cell colonies that contain a plasmid having the 
transposable element. Such plasmids can be screened, for 
example by restriction analysis, to recover those that contain 
a disrupted gene. Such clones can then be introduced directly 
into eukaryotic cells for homologous recombination and 
selection using the same marker gene. 

Also, one can use the system to readily insert a PCR- 
amplified DNA fragment into a vector, thus avoiding traditional 
cloning steps entirely. This can be accomplished by (l) 
providing suitable a pair of PCR primers containing OE termini 
adjacent to the sequence-specific parts of the primers, (2) 
performing standard PCR amplification of a desired nucleic acid 
fragment, (3) performing the in vitro transposition reaction of 
the present invention using the double -stranded products of PCR 
amplification as the donor DNA. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Reznikoff , William S 
Gory sin, Igor Y 
Zhou, Hong 

(ii) TITLE OF INVENTION: System for In Vitro Transposition 
(iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Quarles & Brady 

(B) STREET: 1 South Pinckney Street 

(C) CITY: Madison 

(D) STATE: WI 

(E) COUNTRY: USA 

(F) ZIP: 53703 

o 0 ( V ) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

<D) SOFTWARE: Patentln Release #1.0, Version #1.30 

05 <vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 
on (A) NAME: Berson, Bennett J 

(B) REGISTRATION NUMBER: 37094 

(C) REFERENCE/DOCKET NUMBER: 960296.94142 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 608/251-5000 
35 (B) TELEFAX: 608-251-9166 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1534 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid m ,. fied TnS 

(A) DESCRIPTION: /desc = "Gene encoding modified Tn5 
1 ' transposase enzyme" 

45 (ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 93.. 1523 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CTGACTCTTA TACACAAGTA GCGTCCTGAA CGGAACCTTT CCCGTTTTCC AGGATCTGAT 
50 CTTCCATGTG ACCTCCTAAC ATGGTAACGT TC ATG ATA ACT TOT GCT CTT CAT 

1 5 
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CGT GCG GCC GAC TGG GCT AAA TCT GTG TTC TCT TCG GCG GCG CTG GGT 161 
Arg Ala Ala Asp Trp Ala Lys Ser Val Phe Ser Ser Ala Ala Leu Gly 
10 15 2 0 

GAT CCT CGC CGT ACT GCC CGC TTG GTT AAC GTC GCC GCC CAA TTG GCA 209 
Asp Pro Arg Arg Thr Ala Arg Leu Val Asn Val Ala Ala Gin Leu Ala 
25 30 35 

AAA TAT TCT GGT AAA TCA ATA ACC ATC TCA TCA GAG GGT AGT AAA GCC 257 
Lys Tyr Ser Gly Lys Ser lie Thr He Ser Ser Glu Gly Ser Lys Ala 
40 45 50 55 

GCC CAG GAA GGC GCT TAC CGA TTT ATC CGC AAT CCC AAC GTT TCT GCC 305 
Ala Gin Glu Gly Ala Tyr Arg Phe He Arg Asn Pro Asn Val Ser Ala 
60 65 70 

GAG GCG ATC AGA AAG GCT GGC GCC ATG CAA ACA GTC AAG TTG GCT CAG 353 
Glu Ala He Arg Lys Ala Gly Ala Met Gin Thr Val Lys Leu Ala Gin 
75 80 85 

GAG TTT CCC GAA CTG CTG GCC ATT GAG GAC ACC ACC TCT TTG AGT TAT 401 
Glu Phe Pro Glu Leu Leu Ala He Glu Asp Thr Thr Ser Leu Ser Tyr 
90 95 100 

CGC CAC CAG GTC GCC GAA GAG CTT GGC AAG CTG GGC TCT ATT CAG GAT 44 9 

Arg His Gin Val Ala Glu Glu Leu Gly Lys Leu Gly Ser He Gin Asn 
105 no us H 

AAA TCC CGC GGA TGG TGG GTT CAC TCC GTT CTC TTG CTC GAG GCC ACC 497 
Lys Ser Arg Gly Trp Trp Val His Ser Val Leu Leu Leu Glu Ala Thr 
120 125 130 135 

ACA TTC CGC ACC GTA GGA TTA CTG CAT CAG GAG TGG TGG ATG CGC CCG 545 
Thr Phe Arg Thr Val Gly Leu Leu His Gin Glu Trp Trp Met Arg Pro 
140 145 * " 150 

GAT GAC CCT GCC GAT GCG GAT GAA AAG GAG AGT GGC AAA TGG CTG GCA 593 
Asp Asp Pro Ala Asp Ala Asp Glu Lys Glu Ser Gly Lys Trp Leu Ala 
155 160 165 

GCG GCC GCA ACT AGC CGG TTA CGC ATG GGC AGC ATG ATG AGC AAC GTG 641 
Ala Ala Ala Thr Ser Arg Leu Arg Met Gly Ser Met Met Ser Asn Val 
170 175 180 

ATT GCG GTC TGT GAC CGC GAA GCC GAT ATT CAT GCT TAT CTG CAG GAC 689 
He Ala Val Cys Asp Arg Glu Ala Asp He His Ala Tyr Leu Gin Asp 
185 190 195 

AGG CTG GCG CAT AAC GAG CGC TTC GTG GTG CGC TCC AAG CAC CCA CGC 737 
Arg Leu Ala His Asn Glu Arg Phe Val Val Arg Ser Lys His Pro Arq 
200 205 210 215 

AAG GAC GTA GAG TCT GGG TTG TAT CTG ATC GAC CAT CTG AAG AAC CAA 785 
Lys Asp Val Glu Ser Gly Leu Tyr Leu He Asp His Leu Lys Asn Gin 
220 225 230 

CCG GAG TTG GGT GGC TAT CAG ATC AGC ATT CCG CAA AAG GGC GTG GTG 833 
Pro Glu Leu Gly Gly Tyr Gin He Ser He Pro Gin Lys Gly Val Val 
235 240 245 

GAT AAA CGC GGT AAA CGT AAA AAT CGA CCA GCC CGC AAG GCG AGC TTG 881 
Asp Lys Arg Gly Lys Arg Lys Asn Arg Pro Ala Arg Lys Ala Ser Leu 
250 255 260 

AGC CTG CGC AGT GGG CGC ATC ACG CTA AAA CAG GGG AAT ATC ACG CTC 929 
Ser Leu Arg Ser Gly Arg He Thr Leu Lys Gin Gly Asn He Thr Leu 
265 270 * 275 
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TCC GCA GAA ACG GTG CTG ACC CCG GAT GAA TGT CAG CTA CTG GGC TAT 
sir aS Su ?Sr val Leu Thr Pro Asp Glu Cys Gin Leu Leu Gly Tyr 



395 400 



CTG GAC AAG GGA AAA CGC AAG CGC AAA GAG AAA GCA GGT AGC TTG CAG 
Leu Asp Lys Gly Lys Arg Lys Arg Lys Glu Lys Ala Gly Ser Leu Gin 
410 415 

S S 5S JS S S E SS S S5 5 S S 2S S £ 

425 430 
CGA ACC GGA ATT GCC AGC TGG GGC GCC CTC TGG GAA GGT TGG GAA GCC 
Arg S SJ JS £• Ser Trp Gly Ala Leu Trp Glu Gly Trp Glu Ala 
440 445 

CTG CAA AGT AAA CTG GAT GGC TTT CTT GCC GCC AAG GAT CTG ATG GCG 
Su Si HI i£ ^eu Asp Gly Phe Leu Ala Ala Lys Asp Leu Met Ala 
460 465 

CAG GGG ATC AAG ATC TGA TCAAGAGACA G 
Gin Gly He Lys He * 
475 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 77 amino acids 

(B) TYPE: amino acid 
( D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met lie Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val 



977 



AAC GCG GTG CTG GCC GAG GAG ATT AAC CCG CCC AAG GGT GAG ACC CCG 
Asn Ala Val Leu Ala Glu Glu lie Asn Pro Pro Lys Gly Glu Thr Pro 
280 285 

TTG AAA TGG TTG TTG CTG ACC GGC GAA CCG GTC GAG TCG CTA GCC CAA 
™ Ly^ Trp Leu Leu Leu Thr Gly Glu Pro Val Glu Ser Leu Ala Gin 
300 305 

GCC TTG CGC GTC ATC GAC ATT TAT ACC CAT CGC TGG CGG ATC GAG GAG 
Ala Leu Arg Val lie Asp lie Tyr Thr His Arg Trp Arg lie faiu ^iu 
315 320 

TTC CAT AAG GCA TGG AAA ACC GGA GCA GGA GCC GAG AGG CAA CGC ATG 
Se Ss lVs Ala Trp Lys Thr Gly Ala Gly Ala Glu Arg Gin Arg Met 
330 3 35 

GAG GAG CCG GAT AAT CTG GAG CGG ATG GTC TCG ATC CTC TCG TTT GTT 
Glu Glu Pro Asp Asn Leu Glu Arg Met Val Ser lie Leu Ser Phe val 
345 350 355 

GCG GTC AGG CTG TTA CAG CTC AGA GAA AGC TTC ACG CCG CCG CAA GCA 1217 
aS v£ Arg Leu Eu Gin Leu Arg Glu Ser Phe Thr Pro Pro Gin Ala 
360 365 370 

CTC AGG GCG CAA GGG CTG CTA AAG GAA GCG GAA CAC GTA GAA AGC CAG 1265 
Su aS S2 S£ Gly Leu Leu Lys Glu Ala Glu His Val Glu Ser Gin 
380 385 



1025 



1073 



1121 



1169 



1313 



1361 



1409 



1457 



1505 



1534 



-33- 



SUBSTITUTE SHEET (RULE 26) 



WO 98/10077 PCT/US97/15941 

Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val 
20 25 30 

Asn Val Ala Ala Gin Leu Ala Lys Tyr Ser Gly Lys Ser He Thr He 
35 40 45 

Ser Ser Glu Gly Ser Lys Ala Ala Gin Glu Gly Ala Tyr Arg Phe He 
50 55 60 

Arg Asn Pro Asn Val Ser Ala Glu Ala He Arg Lys Ala Gly Ala Met 
65 70 75 80 

Gin Thr Val Lys Leu Ala Gin Glu Phe Pro Glu Leu Leu Ala He Glu 
85 90 95 

Asp Thr Thr Ser Leu Ser Tyr Arg His Gin Val Ala Glu Glu Leu Gly 
100 105 110 

Lys Leu Gly Ser He Gin Asp Lys Ser Arg Gly Trp Trp Val His Ser 
115 120 125 

Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His 
130 135 140 

Gin Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys 
145 150 155 160 

Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met 
165 170 ^ 175 

Gly Ser Met Met Ser Asn Val He Ala Val Cys Asp Arg Glu Ala Asp 
180 185 190 

He His Ala Tyr Leu Gin Asp Arg Leu Ala His Asn Glu Arg Phe Val 
195 200 205 

Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu 
210 215 220 

He Asp His Leu Lys Asn Gin Pro Glu Leu Gly Gly Tyr Gin He Ser 
225 230 235 " 240 

He Pro Gin Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg 
245 250 255 

Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg He Thr Leu 
260 265 " 270 

Lys Gin Gly Asn He Thr Leu Asn Ala Val Leu Ala Glu Glu He Asn 
275 280 285 

Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Gly Glu 
290 295 300 

Pro Val Glu Ser Leu Ala Gin Ala Leu Arg Val He Asp He Tyr Thr 
305 310 315 320 

His Arg Trp Arg He Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala 
325 330 335 

Gly Ala Glu Arg Gin Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met 
340 345 350 

Val Ser He Leu Ser Phe Val Ala Val Arg Leu Leu Gin Leu Arg Glu 
355 360 365 
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Ser Phe Thr Pro Pro Gin Ala Leu Arg Ala Gin Gly Leu Leu Lys Glu 
370 375 380 

Ala Glu His Val Glu Ser Gin Ser Ala Glu Thr Val Leu Thr Pro Asp 
385 390 " 3 

Glu Cys Gin Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys 



405 



Glu Lys Ala Gly Ser Leu Gin Trp Ala Tyr Met Ala He Ala Arg Leu 
420 425 

Gly Gly Phe Met Asp Ser Lys Arg Thr Gly He Ala Ser Trp Gly Ala 
435 440 -ft^o 

Leu Trp Glu Gly Trp Glu Ala Leu Gin Ser Lys Leu Asp Gly Phe Leu 
450 455 460 

Ala Ala Lys Asp Leu Met Ala Gin Gly He Lys He * 
465 4 70 475 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5838 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : circular 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Plasmxd DNA" 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pRZTLl 

<ix) FEATURE: 

(A) NAME /KEY : insertion_seq 

(B) LOCATION: 1.-19 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

S) OT?S^2^iS" /functions "tetracycline resistance" 

(ix) FEATURE: 

(A) NAME/KEY: CDS _ ^ . 

\l\ SS?SSoSS5ST/fSSi^ 9 "chloramphenicol resistance" 

(ix) FEATURE: 

(A) NAME/KEY: insertion_seq 

(B) LOCATION: 4564.. 4582 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(?) ^r?STO^?i6" 3 /function= "kanamycin resistance" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CTGACTCTTA TACACAAGTA AGCTTTAATG CGGTAGTTTA TCACAGTTAA ATTGCTAACG 

CAGTCAGGCA CCGTGT ATG AAA TCT AAC AAT GCG CTC ATC GTC ATC CTC 

Met Lys Ser Asn Asn Ala Leu He Val lie Leu 
480 485 
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ni ?r ? A S° CTG GAT GCT GTA GGC ATA GGC TTG GTT ATG CCG GTA 157 

y VZZ 1 Thr Leu Asp Ala Val G1 y rle Gly Leu Val Met Pro Val 
490 495 500 



CTG CCG GGC CTC TTG CGG GAT ATC GTC CAT TCC GAC AGC ATC GCC AGT 205 
Leu Pro Gly Leu Leu Arg Asp He Val His Ser Asp Ser He Ala Ser 
505 510 sis 520 

CAC TAT GGC GTG CTG CTA GCG CTA TAT GCG TTG ATG CAA TTT CTA TGC 
His Tyr Gly Val Leu Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys 
525 530 535 

GCA CCC GTT CTC GGA GCA CTG TCC GAC CGC TTT GGC CGC CGC CCA GTC 
Ala Pro Val Leu Gly Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val 
540 545 550 

CTG CTC GCT TCG CTA CTT GGA GCC ACT ATC GAC TAC GCG ATC ATG GCG 
Leu Leu Ala Ser Leu Leu Gly Ala Thr He Asp Tyr Ala He Met Ala 
555 560 565 

OS* CCC GTC CTG TGG ATC CTC TAC Gcc GGA CGC ATC GTG GCC GGC 
Pr ° Val Leu Trp Ile Leu Tyr Ala Gly Arg He Val Ala Gly 
570 575 580 

tT° S?° G ?° ACA GGT GCG GTT GCT °GC GCC TAT ATC GCC GAC ATC 

He Thr Gly Ala Thr Gly Ala Val Ala Gly Ala Tyr Ile Ala Asp He 

585 5 *0 595 600 

ACC GAT GGG GAA GAT CGG GCT CGC CAC TTC GGG CTC ATG AGC GCT TGT 
Thr Asp Gly Glu Asp Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys 
€05 610 ~ 6 15 

TTC GGC GTG GGT ATG GTG GCA GGC CCC GTG GCC GGG GGA CTG TTG GGC 
Phe Gly Val Gly Met Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly 
620 6 25 630 

GCC ATC TCC TTG CAT GCA CCA TTC CTT GCG GCG GCG GTG CTC AAC GGC 
Ala He Ser Leu His Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly 



535 640 



645 



CTC AAC CTA CTA CTG GGC TGC TTC CTA ATG CAG GAG TCG CAT AAG GGA 
Leu Asn Leu Leu Leu Gly Cys Phe Leu Met Gin Glu Ser His Lys Gly 
650 655 



660 



253 



301 



349 



397 



445 



493 



541 



589 



637 



733 



GAG CGT CGA CCG ATG CCC TTG AGA GCC TTC AAC. CCA GTC AGC TCC TTC 685 
Glu Arg Arg Pro Met Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe 
565 670 675 680 

CGG TGG GCG CGG GGC ATG ACT ATC GTC GCC GCA CTT ATG ACT GTC TTC 
Arg Trp Ala Arg Gly Met Thr Ile Val Ala Ala Leu Met Thr Val Phe 
685 690 695 

TTT ATC ATG CAA CTC GTA GGA CAG GTG CCG GCA GCG CTC TGG GTC ATT 781 
Phe He Met Gin Leu Val Gly Gin Val Pro Ala Ala Leu Trp Val He 
700 705 710 

TTC GGC GAG GAC CGC TTT CGC TGG AGC GCG ACG ATG ATC GGC CTG TCG 829 
Pne Gly Glu Asp Arg Phe Arg Trp Ser Ala Thr Met Ile Gly Leu Ser 
715 720 725 

CTT GCG GTA TTC GGA ATC TTG CAC GCC CTC GCT CAA GCC TTC GTC ACT 877 
Leu Ala Val Phe Gly Ile Leu His Ala Leu Ala Gin Ala Phe Val Thr 
730 735 740 

GGT CCC GCC ACC AAA CGT TTC GGC GAG AAG CAG GCC ATT ATC GCC GGC 925 
Gly Pro Ala Thr Lys Arg Phe Gly Glu Lys Gin Ala Ile Ile Ala Gly 
745 7 50 755 760 
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ATG GCG GCC GAC GCG CTG GGC TAC GTC TTG CTG GCG TTC GCG ACG CGA 973 
Met Ala Ala Asp Ala Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg 
765 770 775 

GGC TGG ATG GCC TTC CCC ATT ATG ATT CTT CTC GCT TCC GGC GGC ATC 1021 
Gly Trp Met Ala Phe Pro lie Met He Leu Leu Ala Ser Gly Gly He 
780 785 790 

GGG ATG CCC GCG TTG CAG GCC ATG CTG TCC AGG CAG GTA GAT GAC GAC 1069 
Gly Met Pro Ala Leu Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp 
795 800 805 

CAT CAG GGA CAG CTT CAA GGA TCG CTC GCG GCT CTT ACC AGC CTA ACT 1117 
His Gin Gly Gin Leu Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr 
810 815 820 

TCG ATC ACT GGA CCG CTG ATC GTC ACG GCG ATT TAT GCC GCC TCG GCG 1165 
Ser He Thr Gly Pro Leu He Val Thr Ala He Tyr Ala Ala Ser Ala 
825 830 835 840 

AGC ACA TGG AAC GGG TTG GCA TGG ATT GTA GGC GCC GCC CTA TAC CTT 1213 
Ser Thr Trp Asn Gly Leu Ala Trp He Val Gly Ala Ala Leu Tyr Leu 
845 850 855 

GTC TGC CTC CCC GCG TTG CGT CGC GGT GCA TGG AGC CGG GCC ACC TCG 1261 
Val Cys Leu Pro Ala Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser 
860 865 870 

ACC TGA ATGGAAGCCG GCGGCACCTC GCTAACGGAT TCACCACTCC AAGAATTGGA 1317 
Thr * 



1377 
1437 



GCCAATCAAT TCTTGCGGAG AACTGTGAAT GCGCAAACCA ACCCTTGGCA GAACATATCC 
ATCGCGTCCG CCATCTCCAG CAGCCGCACG CGGCGCATCT CGGGCAGCGT TGGGTCCTGG 
CCACGGGTGC GCATGATCGT GCTCCTGTCG TTGAGGACCC GGCTAGGCTG GCGGGGTTGC 1497 
CTTACTGGTT AGCAGAATGA ATCACCGATA CGCGAGCGAA CGTGAAGCGA CTGCTGCTGC 1557 
AAAACGTCTG CGACCTGAGC AACAACATGA ATGGTCTTCG GTTTCCGTGT TTCGTAAAGT 1617 
CTGGAAACG C GGAAGTCCCC TACGTGCTGC TGAAGTTGCC CGCAACAGAG AGTGGAACCA 
ACCGGTGATA CCACGATACT ATGACTGAGA GTCAACGCCA TGAGCGGCCT CATTTCTTAT 
TCTGAGTTAC AACAGTCCGC ACCGCTGTCC GGTAGCTCCT TCCGGTGGGC GCGGGGCATG 1797 
ACTATCGTCG CCG CACTTAT GACTGTCTTC TTTATCATGC AACTCGTAGG ACAGGTGCCG 1857 
GCAGCGCCCA ACAGTCCCCC GGCCACGGGG CCTGCCACCA TACCCACGCC GAAACAAGCG 
CCCTGCACCA TTATGTTCCG GATCTGCATC GCAGGATGCT GCTGGCTACC CTGTGGAACA 
CCTACATCTG TATTAACGAA GCGCTAACCG TTTTTATCAG GCTCTGGGAG GCAGAATAAA 
TGATCATATC GTCAATTATT ACCTCCACGG GGAGAGCCTG AGCAAACTGG CCTCAGGCAT 
TTGAGAAGCA CACGGTCACA CTGCTTCCGG TAGTCAATAA ACCGGTAAAC CAGCAATAGA 
CATAAGCGGC TATTTAACGA CCCTGCCCTG AACCGACGAC CGGGTCGAAT TTGCTTTCGA 
ATTTCTGCCA TTCATCCGCT TATTATCAAT TATTCAGGCG TAGCACCAGG CGTTTAAGGG 
CACCAATAAC TGCCTTAAAA AAATTACGCC CCGCCCTGCC ACTCATCGCA GTACTGTTGT 
AATTCATTAA GCATTCTGCC GACATGGAAG CCATCACAGA CGGCATGATG AACCTGAATC 2397 



1677 
1737 



1917 
1977 
2037 
2097 
2157 
2217 
2277 
2337 
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GCCAGCGGCA TCAGCACCTT GTCGCCTTGC GTATAATATT TGCCCATGGT GAAAACGGGG 2457 

GCGAAGAAGT TGTCCATATT GGCCACGTTT AAATCAAAAC TGGTGAAACT CACCCAGGGA 2517 

TTGGCTGAGA CGAAAAACAT ATTCTCAATA AACCCTTTAG GGAAATAGGC CAGGTTTTCA 2577 

CCGTAACACG CCACATCTTG CGAATATATG TGTAGAAACT GCCGGAAATC GTCGTGGTAT 2637 

TCACTCCAGA GCGATGAAAA CGTTTCAGTT TGCTCATGGA AAACGGTGTA ACAAGGGTGA 2697 

ACACTATCCC ATATCACCAG CTCACCGTCT TTCATTGCCA TACGGAATTC CGGATGAGCA 2757 

TTCATCAGGC GGGCAAGAAT GTGAATAAAG GCCGGATAAA ACTTGTGCTT ATTTTTCTTT 2817 

ACGGTCTTTA AAAAGGCCGT AATATCCAGC TGAACGGTCT GGTTATAGGT ACATTGAGCA 2877 

ACTGACTGAA ATGCCTCAAA ATGTTCTTTA CGATGCCATT GGGATATATC AACGGTGGTA 2937 

TATCCAGTGA TTTTTTTCTC CATTTTAGCT TCCTTAGCTC CTGAAAATCT CGATAACTCA 2997 

AAAAATACGC CCGGTAGTGA TCTTATTTCA TTATGGTGAA AGTTGGAACC TCTTACGTGC 3057 

CGATCAACGT CTCATTTTCG CCAAAAGTTG GCCCAGGGCT TCCCGGTATC AACAGGGACA 3117 

CCAGGATTTA TTTATTCTGC GAAGTGATCT TCCGTCACAG GTATTTATTC GGCGCAAAGT 3177 

GCGTCGGGTG ATGCTGCCAA CTTACTGATT TAGTGTATGA TGGTGTTTTT GAGGTGCTCC 3237 

AGTGGCTTCT GTTTCTATCA GCTGTCCCTC CTGTTCAGCT ACTGACGGGG TGGTGCGTAA 3297 

CGGCAAAAGC ACCGCCGGAC ATCAGCGCTA GCGGAGTGTA TACTGGCTTA CTATGTTGGC 3357 

ACTGATGAGG GTGTCAGTGA AGTGCTTCAT GTGGCAGGAG AAAAAAGGCT GCACCGGTGC 3417 

GTCAGCAGAA TATGTGATAC AGGATATATT CCGCTTCCTC GCTCACTGAC TCGCTACGCT 34 77 

CGGTCGTTCG ACTGCGGCGA GCGGAAATGG CTTACGAACG GGGCGGAGAT TTCCTGGAAG 3537 

ATGCCAGGAA GATACTTAAC AGGGAAGTGA GAGGGCCGCG GCAAAG CCGT TTTTCCATAG 3 5 97 

GCTCCGCCCC CCTGACAAGC ATCACGAAAT CTGACGCTCA AATCAGTGGT GGCGAAACCC 3657 

GACAGGACTA TAAAGATACC AGGCGTTTCC CCTGGCGGCT CCCTCGTGCG CTCTCCTGTT 3717 

CCTGCCTTTC GGTTTACCGG TGTCATTCCG CTGTTATGGC CGCGTTTGTC TCATTCCACG 3777 

CCTGACACTC AGTTCCGGGT AGGCAGTTCG CTCCAAGCTG GACTGTATGC ACGAACCCCC 3837 

CGTTCAGTCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGAAAG 3897 

ACATGCAAAA GCACCACTGG CAGCAGCCAC TGGTAATTGA TTTAGAGGAG TTAGTCTTGA 3957 

AGTCATGCGC CGGTTAAGGC TAAACTGAAA GGACAAGTTT TGGTGACTGC GCTCCTCCAA 4017 

GCCAGTTACC TCGGTTCAAA GAGTTGGTAG CTCAGAGAAC CTTCGAAAAA CCGCCCTGCA 4077 

AGGCGGTTTT TTCGTTTTCA GAGCAAGAGA TTACG CGCAG ACCAAAACGA TCTCAAGAAG 4137 

ATCATCTTAT TAATCAGATA AAATATTTCT AGAGGTGAAC CATCACCCTA ATCAAGTTTT 4197 

TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA AAGGGATGCC CCGATTTAGA 4257 

GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG GGAAGAAAGC GAAAGGAGCG 4317 

GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG TAACCACCAC ACCCGCCGCG 4377 

CTTAATGCGC CGCTACAGCG CCATTCGCCA TTCAGGCTGC GCAACTGTTG GGAAGGGCGA 4437 
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TCGGTGCGGG CCTCTTCGCT ATTACGCCAG CTGGCGAAAG GGGGATGTGC TGCAAGGCGA 4497 
TTAAGTTGGG TAACGCCAGG GTTTTCCCAG TCACGACGTT GTAAAACGAC GGCCAGTGCC 4557 
AAGCTTACTT GTGTATAAGA GTCAGTCGAC CTGCAGGGGG GGGGGGGAAA GCCACGTTGT 4617 
GTCTCAAAAT CTCTGATGTT ACATTGCACA AGATAAAAAT ATATCATCAT GAACAATAAA 
ACTGTCTGCT TACATAAACA GTAATACAAG GGGTGTT ATG AGC CAT ATT CAA CGG 

1 ^ 5 

GAA ACG TCT TGC TCG AGG CCG CGA TTA AAT TCC AAC ATG GAT GCT GAT 4780 
Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn Ser Asn Met Asp Ala Asp 
10 15 20 

TTA TAT GGG TAT AAA TGG GCT CGC GAT AAT GTC GGG CAA TCA GOT GCG 4828 
Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn Val Gly Gin Ser Gly Ala 
25 30 35 

ACA ATC TAT CGA TTG TAT GGG AAG CCC GAT GCG CCA GAG TTG TTT CTG 4876 
Thr lie Tyr Arg Leu Tyr Gly Lys Pro Asp Ala Pro Glu Leu Phe Leu 
40 45 50 

AAA CAT GGC AAA GGT AGC GTT GCC AAT GAT GTT ACA GAT GAG ATG GTC 4924 
Lys His Gly Lys Gly Ser Val Ala Asn Asp Val Thr Asp Glu Met Val 
55 60 65 

AGA CTA AAC TGG CTG ACG GAA TTT ATG CCT CTT CCG ACC ATC AAG CAT 4972 
Arg Leu Asn Trp Leu Thr Glu Phe Met Pro Leu Pro Thr lie Lys His 
75 80 u:s 

TTT ATC CGT ACT CCT GAT GAT GCA TGG TTA CTC ACC ACT GCG ATC CCC 5020 
Phe lie Arg Thr Pro Asp Asp Ala Trp Leu Leu Thr Thr Ala lie Pro 
90 95 100 

GGG AAA ACA GCA TTC CAG GTA TTA GAA GAA TAT CCT GAT TCA GGT GAA 5068 
Gly Lys Thr Ala Phe Gin Val Leu Glu Glu Tyr Pro Asp Ser Gly Glu 
105 HO 115 

AAT ATT GTT GAT GCG CTG GCA GTG TTC CTG CGC CGG TTG CAT TCG ATT 5116 
Asn lie Val Asp Ala Leu Ala Val Phe Leu Arg Arg Leu His Ser lie 
120 125 130 

CCT GTT TGT AAT TGT CCT TTT AAC AGC GAT CGC GTA TTT CGT CTC GCT 5164 
Pro Val cys Asn Cys Pro Phe Asn Ser Asp Arg Val Phe Arg Leu Ala 
135 ' 140 I 45 •*■=»" 

CAG GCG CAA TCA CGA ATG AAT AAC GGT TTG GTT GAT GCG AGT GAT TTT 5212 
S2 Ala Gin Ser Arg Met Asn Asn Gly Leu Val Asp Ala Ser Asp Phe 
155 160 165 

GAT GAC GAG CGT AAT GGC TGG CCT GTT GAA CAA GTC TGG AAA GAA ATG 5260 
Asp Asp Glu Arg Asn Gly Trp Pro Val Glu Gin Val Trp Lys Glu Met 
170 175 180 

CAT AAG CTT TTG CCA TTC TCA CCG GAT TCA GTC GTC ACT CAT GOT GAT 5308 
HiJ £y"s leu Leu Pro Phe Ser Pro Asp Ser Val Val Thr Hxs Gly Asp 
185 190 195 

TTC TCA CTT GAT AAC CTT ATT TTT GAC GAG GGG AAA TTA ATA GGT TGT 5356 
Phe Ser Leu Asp Asn Leu He Phe Asp Glu Gly Lys Leu He Gly Cys 
200 205 210 

ATT GAT GTT GGA CGA GTC GGA ATC GCA GAC CGA TAC CAG GAT CTT GCC 5404 
He Asp Val Gly Arg Val Gly He Ala Asp Arg Tyr Gin Asp Leu Ala 
215 " 220 225 230 
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ATC CTA TGG AAC TGC CTC GGT GAG TTT TCT CCT TCA TTA CAG AAA CGG 5452 
lie Leu Trp Asn Cys Leu Gly Glu Phe Ser Pro Ser Leu Gin Lys Arg 
235 240 245 

CTT TTT CAA AAA TAT GGT ATT GAT AAT CCT GAT ATG AAT AAA TTG CAG 5500 
Leu Phe Gin Lys Tyr Gly He Asp Asn Pro Asp Met Asn Lys Leu Gin 
250 255 260 

TTT CAT TTG ATG CTC GAT GAG TTT TTC TAA TCAGAATTGG TTAATTGGTT 5550 
Phe His Leu Met Leu Asp Glu Phe Phe * 
265 270 

GTAACACTGG CAGAGCATTA CGCTGACTTG ACGGGACGGC GGCTTTGTTG AATAAATCGA 5610 

ACTTTTGCTG AGTTGAAGGA TCAGATCACG CATCTTCCCG ACAACGCAGA CCGTTCCGTG 5670 

GCAAAGCAAA AGTTCAAAAT CACCAACTGG TCCACCTACA ACAAAGCTCT CATCAACCGT 5730 

GGCTCCCTCA CTTTCTGGCT GGATGATGGG GCGATTCAGG CCTGGTATGA GTCAGCAACA 5790 

CCTTCTTCAC GAGGCAGACC TCAGCGCCCC CCCCCCCCTG CAGGTCGA 58 38 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 97 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Lys Ser Asn Asn Ala Leu He Val He Leu Gly Thr Val Thr Leu 
1 5 10 15 

Asp Ala Val Gly He Gly Leu Val Met Pro Val Leu Pro Gly Leu Leu 
20 25 30 

Arg Asp He Val His Ser Asp Ser He Ala Ser His Tyr Gly Val Leu 
35 40 45 

Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys Ala Pro Val Leu Glv 
50 55 60 

Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val Leu Leu Ala Ser Leu 
65 70 ~ 75 80 

Leu Gly Ala Thr He Asp Tyr Ala He Met Ala Thr Thr Pro Val Leu 
85 so 95 

Trp He Leu Tyr Ala Gly Arg He Val Ala Gly He Thr Gly Ala Thr 
100 105 no 

Gly Ala Val Ala Gly Ala Tyr He Ala Asp He Thr Asp Gly Glu Asp 
"5 120 125 

Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys Phe Gly Val Gly Met 
130 135 140 

Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly Ala He Ser Leu His 
145 150 155 160 

Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly Leu Asn Leu Leu Leu 
165 170 175 
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Gly Cys Phe Leu Met Gin Glu Ser His Lys Gly Glu Arg Arg Pro Met 
180 185 190 

Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe Arg Trp Ala Arg Gly 
195 200 205 

Met Thr He Val Ala Ala Leu Met Thr Val Phe Phe He Met Gin Leu 
210 215 220 

Val Gly Gin Val Pro Ala Ala Leu Trp Val He Phe Gly Glu Asp Arg 
225 230 235 240 

Phe Arg Trp Ser Ala Thr Met He Gly Leu Ser Leu Ala Val Phe Gly 
245 250 255 

He Leu His Ala Leu Ala Gin Ala Phe Val Thr Gly Pro Ala Thr Lys 
260 265 270 

Arg Phe Gly Glu Lys Gin Ala He He Ala Gly Met Ala Ala Asp Ala 
275 280 285 

Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg Gly Trp Met Ala Phe 
290 295 300 

Pro He Met He Leu Leu Ala Ser Gly Gly He Gly Met Pro Ala Leu 
305 310 315 320 

Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp His Gin Gly Gin Leu 
325 330 335 

Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr Ser He Thr Gly Pro 
340 345 350 

Leu He Val Thr Ala He Tyr Ala Ala Ser Ala Ser Thr Trp Asn Gly 
355 360 365 

Leu Ala Trp He Val Gly Ala Ala Leu Tyr Leu Val Cys Leu Pro Ala 
370 375 380 

Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser Thr * 
385 390 395 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 220 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

Met Glu Lys Lys He Thr Gly Tyr Thr Thr Val Asp He Ser Gin Trp 
15 10 15 

His Arg Lys Glu His Phe Glu Ala Phe Gin Ser Val Ala Gin Cys Thr 
20 25 30 

Tyr Asn Gin Thr Val Gin Leu Asp He Thr Ala Phe Leu Lys Thr Val 
35 40 45 

Lys Lys Asn Lys His Lys Phe Tyr Pro Ala Phe He His He Leu Ala 
50 ' 55 60 

Arg Leu Met Asn Ala His Pro Glu Phe Arg Met Ala Met Lys Asp Gly 
65 70 75 80 

-41- 



SUBSTITUTE SHEET (RULE 26) 



WO 98/10077 PCT/US97/15941 

5 Glu Leu Val lie Trp Asp Ser Val His Pro Cys Tyr Thr Val Phe His 

85 90 95 

Glu Gin Thr Glu Thr Phe Ser Ser Leu Trp Ser Glu Tyr His Asp Asp 
100 105 110 

Phe Arg Gin Phe Leu His lie Tyr Ser Gin Asp Val Ala Cys Tyr Gly 
10 115 120 125 

Glu Asn Leu Ala Tyr Phe Pro Lys Gly Phe lie Glu Asn Met Phe Phe 
130 135 140 

Val Ser Ala Asn Pro Trp Val Ser Phe Thr Ser Phe Asp Leu Asn Val 
145 150 155 160 

15 Ala Asn Met Asp Asn Phe Phe Ala Pro Val Phe Thr Met Gly Lys Tyr 

165 170 175 

Tyr Thr Gin Gly Asp Lys Val Leu Met Pro Leu Ala lie Gin Val His 
180 185 190 

His Ala Val Cys Asp Gly Phe His Val G1 y Ar S Met Leu Asn Glu Leu 

20 195 200 205 

Gin Gin Tyr Cys Asp Glu Trp Gin Gly Gly Ala * 
210 215 " ^ 220 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 272 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

3 0 Met Ser His lie Gin Arg Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn 

15 10 15 

Ser Asn Met Asp Ala Asp Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn 
20 25 30 

Val Gly Gin Ser Gly Ala Thr lie Tyr Arg Leu Tyr Gly Lys Pro Asp 
35 35 40 45 

Ala Pro Glu Leu Phe Leu Lys His Gly Lys Gly Ser Val Ala Asn Asp 
50 55 60 

Val Thr Asp Glu Met Val Arg Leu Asn Trp Leu Thr Glu Phe Met Pro 
65 70 - 75 80 

4 0 Leu Pro Thr lie Lys His Phe lie Arg Thr Pro Asp Asp Ala Trp Leu 

8 5 90 95 

Leu Thr Thr Ala lie Pro Gly Lys Thr Ala Phe Gin Val Leu Glu Glu 
100 105 110 

Tyr Pro Asp Ser Gly Glu Asn lie Val Asp Ala Leu Ala Val Phe Leu 
45 115 120 125 

Arg Arg Leu His Ser He Pro Val Cys Asn Cys Pro Phe Asn Ser Asp 
130 135 140 

Arg Val Phe Arg Leu Ala Gin Ala Gin Ser Arg Met Asn Asn Gly Leu 
145 150 155 160 
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Val Asp Ala Ser Asp Phe Asp Asp Glu Arg Asn Gly Trp Pro Val Glu 
165 I 70 175 

Gin Val Trp Lys Glu Met His Lys Leu Leu Pro Phe Ser Pro Asp Ser 
180 185 190 

Val Val Thr His Gly Asp Phe Ser Leu Asp Asn Leu lie Phe Asp Glu 
195 " 200 205 

Gly Lys Leu He Gly Cys He Asp Val Gly Arg Val Gly He Ala Asp 
210 215 220 

Arg Tyr Gin Asp Leu Ala He Leu Trp Asn Cys Leu Gly Glu Phe Ser 

Pro Ser Leu Gin Lys Arg Leu Phe Gin Lys Tyr Gly He Asp Asn Pro 
245 2 50 

Asp Met Asn Lys Leu Gin Phe His Leu Met Leu Asp Glu Phe Phe * 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid , „•„ 

(A) DESCRIPTION: /desc « "Tn5 wild type outside end 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CTGACTCTTA TACACAAGT 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid . . ^ .„ 
ia\ nrcsrPTPTioN: /desc = »Tn5 mutant outside end 



(A) DESCRIPTION: /desc 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CTGTCTCTTA TACACATCT 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = »Tn5 mutant outside end 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CTGTCTCTTA TACAGATCT 



19 



19 



19 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 wild type inside end" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTGTCTCTTG ATCAGATCT 19 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19182 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Plasmid pRZ4196" 

(ix) FEATURE: 

(A) NAME/KEY: repeat_unit 

(B) LOCATION: 94.. 112 

<D) OTHER INFORMATION: /note= "Wild type OE sequence" 

(ix) FEATURE: 

(A) NAME/KEY: repeat_unit 

(B) LOCATION: 12184 12225 

(D) OTHER INFORMATION: /note= "Cassette IE" 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 11: 



TTC CTGTAAC 


AATAGCAATA 


CCCCAAATAC 


CTAATGTAGT 


TCCAGCAAGC 


AAGCTAAAAA 


60 


GTAAAGCAAC 


AACATAACTC 


ACCCCTGCAT 


CTGCTGACTC 


TTATACACAA 


GTAGCGTCCC 


120 


GGGATCGGGA 


TCCCGTCGTT 


TTACAACGTC 


GTGACTGGGA 


AAACCCTGGC 


GTTACCCAAC 


180 


TTAATCGCCT 


TGCAGCACAT 


CCCCCTTTCG 


CCAGCTGGCG 


TAATAGCGAA 


GAGGCCCGCA 


240 


CCGATCGCCC 


TTCCCAACAG 


TTGCGCAGCC 


TGAATGGCGA 


ATGGCGCTTT 


GCCTGGTTTC 


300 


CGGCACCAGA 


AGCGGTGCCG 


GAAAGCTGGC 


TGGAGTGCGA 


TCTTCCTGAG 


GCCGATACTG 


360 


TCGTCGTCCC 


CTCAAACTGG 


CAGATGCACG 


GTTACGATGC 


GCCCATCTAC 


ACCAACGTAA 


420 


CCTATCCCAT 


TACGGTCAAT 


CCGCCGTTTG 


TTCCCACGGA 


GAATCCGACG 


GGTTGTTACT 


480 


CGCTCACATT 


TAATGTTGAT 


GAAAGCTGGC 


TACAGGAAGG 


CCAGACGCGA 


ATTATTTTTG 


540 


ATGGCGTTAA 


CTCGGCGTTT 


CATCTGTGGT 


GCAACGGGCG 


CTGGGTCGGT 


TACGGCCAGG 


600 


ACAGTCGTTT 


GCCGTCTGAA 


TTTGACCTGA 


GCGCATTTTT 


ACGCGCCGGA 


GAAAACCGCC 


660 


TCG CGGTGAT 


GGTGCTGCGT 


TGGAGTGACG 


GCAGTTATCT 


GGAAGATCAG 


GATATGTGGC 


720 


GGATGAGCGG 


CATTTTCCGT 


GACGTCTCGT 


TGCTGCATAA ACCGACTACA 


CAAATCAGCG 


780 


ATTTCCATGT 


TGCCACTCGC 


TTTAATGATG 


ATTTCAGCCG 


CGCTGTACTG 


GAGGCTGAAG 


840 


TTCAGATGTG 


CGGCGAGTTG 


CGTGACTACC 


TACGGGTAAC 


AGTTTCTTTA 


TGGCAGGGTG 


900 
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AAACGCAGGT CGCCAGCGGC 
GTTATGCCGA TCGCGTCACA 
AAATCCCGAA TCTCTATCGT 
AAGCAGAAGC CTGCGATGTC 
TGAACGGCAA GCCGTTGCTG 
GTCAGGTCAT GGATGAGCAG 
TTAACGCCGT GCGCTGTTCG 
GCTACGGCCT GTATGTGGTG 
ATCGTCTGAC CGATGATCCG 
TGCAGCGCGA TCGTAATCAC 
ACGGCGCTAA TCACGACGCG 
TGCAGTATGA AGGCGGCGGA 
CGCGCGTGGA TGAAGACCAG 
TTTCGCTACC TGGAGAGACG 
ACAGTCTTGG CGGTTTCGCT 
GCGGCTTCGT CTGGGACTGG 
CGTGGTCGGC TTACGGCGGT 
ACGGTCTGGT CTTTGCCGAC 
AGCAGTTTTT CCAGTTCCGT 
TCCGTCATAG CGATAACGAG 
CAAGCGGTGA AGTGCCTCTG 
AACTACCGCA GCCGG AG AG C 
ACGCGACCGC ATGGTCAGAA 
AAAACCTCAG TGTGACGCTC 
AAATGGATTT TTGCAT CG AG 
TTCTTTCACA GATGTGGATT 
TCACCCGTGC ACCGCTGGAT 
ACGCCTGGGT CGAACGCTGG 
AGTGCACGGC AGATACACTT 
ATCAGGGGAA AACCTTATTT 
TGGCGATTAC CGTTGATGTT 
TGAACTGCCA GCTGGCGCAG 
AAAACTATCC CGACCGCCTT 
ACATGTATAC CCCGTACGTC 



ACCGCGCCTT 
CTACGTCTGA 
GCGGTGGTTG 
GGTTTCCGCG 
ATTCGAGGCG 
ACGATGGTGC 
CATTATCCGA 
GATGAAGCCA 
CGCTGGCTAC 
CCGAGTGTGA 
CTGTATCGCT 
GCCGACACCA 
CCCTTCCCGG 
CGCCCGCTGA 
AAATACTGGC 
GTGGATCAGT 
GATTTTGGCG 
CGCACGCCGC 
TTATCCGGGC 
CTCCTGCACT 
GATGTCGCTC 
GCCGGGCAAC 
GCCGGGCACA 
CCCGCCGCGT 
CTGGGTAATA 
GGCGATAAAA 
AACGACATTG 
AAGGCGGCGG 
GCTGATGCGG 
ATCAGCCGGA 
GAAGTGGCGA 
GTAGCAGAGC 
ACTGCCGCCT 
TTCCCGAGCG 



TCGGCGGTGA 
ACGTCGAAAA 
AACTGCACAC 
AGGTGCGGAT 
TTAACCGTCA 
AGGATATCCT 
ACCATCCGCT 
ATATTGAAAC 
CGGCGATGAG 
TCATCTGGTC 
GGATCAAATC 
CGGCCACCGA 
CTGTGCCGAA 
TCCTTTGCGA 
AGGCGTTTCG 
CGCTGATTAA 
ATACGCCGAA 
ATCCAGCGCT 
AAACCATCGA 
GGATGGTGGC 
CACAAGGTAA 
TCTGGCTCAC 
TCAGCGCCTG 
CCCACGCCAT 
AGCGTTGGCA 
AACAACTGCT 
GCGTAAGTGA 
GCCATTACCA 
TGCTGATTAC 
AAACCTACCG 
GCGATACACC 
GGGTAAACTG 
GTTTTGACCG 
AAAACGGTCT 



AATTATCGAT 
CCCGAAACTG 
CGCCGACGGC 
TGAAAATGGT 
CGAGCATCAT 
GCTGATGAAG 
GTGGTACACG 
CCACGGCATG 
CGAACGCGTA 
GCTGGGGAAT 
TGTCGATCCT 
TATTATTTGC 
ATGGTCCATC 
ATACGCCCAC 
TCAGTATCCC 
ATATGATGAA 
CGATCGCCAG 
GACGGAAGCA 
AGTGACCAGC 
GCTGGATGGT 
ACAGTTGATT 
AGTACGCGTA 
GCAGCAGTGG 
CCCGCATCTG 
ATTTAACCGC 
GACGCCGCTG 
AGCGACCCGC 
GGCCGAAGCA 
GACCGCTCAC 
GATTGATGGT 
GCATCCGGCG 
GCTCGGATTA 
CTGGGATCTG 
GCGCTGCGGG 



GAGCGTGGTG 
TGGAGCGCCG 
ACGCTGATTG 
CTGCTGCTGC 
CCTCTGCATG 
CAGAACAACT 
CTGTGCGACC 
GTGCCAATGA 
ACGCGAATGG 
GAATCAGGCC 
TCCCGCCCGG 
CCGATGTACG 
AAAAAATGGC 
GCGATGGGTA 
CGTTTACAGG 
AACGGCAACC 
TTCTGTATGA 
AAACACCAGC 
GAATACCTGT 
AAGCCGCTGG 
GAACTGCCTG 
GTGCAACCGA 
CGTCTGGCGG 
ACCACCAGCG 
CAGTCAGGCT 
CGCGATCAGT 
ATTGACCCTA 
GCGTTGTTGC 
GCGTGGCAGC 
AGTGGTCAAA 
CGGATTGGCC 
GGGCCGCAAG 
CCATTGTCAG 
ACGCGCGAAT 



960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
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TGAATTATGG CCCACACCAG TGGCGCGGCG ACTTCCAGTT CAACATCAGC CGCTACAGTC 3000 

AACAGCAACT GATGGAAACC AGCCATCGCC ATCTGCTGCA CGCGGAAGAA GGCACATGGC 3060 

TGAATATCGA CGGTTTCCAT ATGGGGATTG GTGGCGACGA CTCCTGGAGC CCGTCAGTAT 3120 

CGGCGGATTC CAGCTGAGCG CCGGTCGCTA CCATTACCAG TTGGTCTGGT GTCAAAAATA 3180 

ATAATAACCG GGCAGGCCAT GTCTGCCCGT ATTTCGCGTA AGGAAATCCA TTATGTACTA 3240 

TTTAAAAAAC ACAAACTTTT GGATGTTCGG TTTATTCTTT TTCTTTTACT TTTTTATCAT 3300 

GGGAGCCTAC TTCCCGTTTT TCCCGATTTG GCTACATGAC ATCAACCATA TCAGCAAAAG 3360 

TGATACGGGT ATTATTTTTG CCGCTATTTC TCTGTTCTCG CTATTATTCC AACCGCTGTT 3420 

TGGTCTGCTT TCTGACAAAC TCGGGCTGCG CAAATACCTG CTGTGGATTA TTACCGGCAT 3480 

GTTAGTGATG TTTGCGCCGT TCTTTATTTT TATCTTCGGG CCACTGTTAC AATACAACAT 3540 

TTTAGTAGGA TCGATTGTTG GTGGTATTTA TCTAGGCTTT TGTTTTAACG CCGGTGCGCC 3600 

AGCAGTAGAG GCATTTATTG AGAAAGTCAG CCGTCGCAGT AATTTCGAAT TTGGTCGCGC 3660 

GCGGATGTTT GGCTGTGTTG GCTGGGCGCT GTGTGCCTCG ATTGTCGGCA TCATGTTCAC 3720 

CATCAATAAT CAGTTTGTTT TCTGGCTGGG CTCTGGCTGT GCACTCATCC TCGCCGTTTT 3780 

ACTCTTTTTC GCCAAAACGG ATGCGCCCTC TTCTGCCACG GTTGCCAATG CGGTAGGTGC 384 0 

CAACCATTCG GCATTTAGCC TTAAGCTGGC ACTGGAACTG TTCAGACAGC CAAAACTGTG 3900 

GTTTTTGTCA CTGTATGTTA TTGGCGTTTC CTGCACCTAC GATGTTTTTG ACCAACAGTT 3960 

TGCTAATTTC TTTACTTCGT TCTTTGCTAC CGGTGAACAG GGTACGCGGG TATTTGGCTA 402 0 

CGTAACGACA ATGGGCGAAT TACTTAACGC CTCGATTATG TTCTTTGCGC CACTGATCAT 4080 

TAATCGCATC GGTGGGAAAA ACGCCCTGCT GCTGGCTGGC ACTATTATGT CTGTACGTAT 4140 

TATTGGCTCA TCGTTCGCCA CCTCAGCGCT GGAAGTGGTT ATTCTGAAAA CGCTGCATAT 4200 

GTTTGAAGTA CCGTTCCTGC TGGTGGGCTG CTTTAAATAT ATTACCAGCC AGTTTGAAGT 4260 

GCGTTTTTCA GCGACGATTT ATCTGGTCTG TTTCTGCTTC TTTAAGCAAC TGGCGATGAT 4320 

TTTTATGTCT GTACTGGCGG GCAATATGTA TGAAAGCATC GGTTTCCAGG GCGCTTATCT 4380 

GGTGCTGGGT CTGGTGGCGC TGGGCTTCAC CTTAATTTCC GTGTTCACGC TTAGCGGCCC 4440 

CGGCCCGCTT TCCCTGCTGC GTCGTCAGGT GAATGAAGTC GCTTAAGCAA TCAATGTCGG 4500 

ATGCGGCGCG ACGCTTATCC GACCAACATA TCATAACGGA GTGATCGCAT TGAACATGCC 4560 

AATGACCGAA AGAATAAGAG CAGGCAAGCT ATTTACCGAT ATGTGCGAAG GCTTACCGGA 4620 

AAAAAGACTT CGTGGGAAAA CGTTAATGTA TGAGTTTAAT CACTCGCATC CATCAGAAGT 4680 

TGAAAAAAGA GAAAGCCTGA TTAAAGAAAT GTTTGCCACG GTAGGGGAAA ACGCCTGGGT 4740 

AGAACCGCCT GTCTATTTCT CTTACGGTTC CAACATCCAT ATAGGCCGCA ATTTTTATGC 4800 

AAATTTCAAT TTAACCATTG TCGATGACTA CACGGTAACA ATCGGTGATA ACGTACTGAT 4860 

TGCACCCAAC GTTACTCTTT CCGTTACGGG ACACCCTGTA CACCATGAAT TGAGAAAAAA 4920 

CGGCGAGATG TACTCTTTTC CGATAACGAT TGGCAATAAC GTCTGGATCG GAAGTCATGT 4 980 
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5 GGTTATTAAT CCAGGCGTCA CCATCGGGGA TAATTCTGTT 

CACAAAAGAC ATTCCACCAA ACGTCGTGGC GGCTGGCGTT 
AATAAACGAC CGGGATAAGC ACTATTATTT CAAAGATTAT 
AATTATAAAA ATTGCCTGAT ACGCTGCGCT TATCAGGCCT 
TAGCCGCATC CGGCATGAAC AAAGCGCAGG AACAAGCGTC 
10 AC AGCTG CGG AAAACGTACT GGTGCAAAAC GCAGGGTTAT 

ACAGCGCATG AAATGCCCAG TCCATCAGGT AATTGCCGCT 
AAAACCACGG GGCAAGCCCG GCGATGATAA AACCGATTCC 
TGCCAGCAAT AGCCGGTTGC ACAGAGTGAT CGAGCGCCAG 
CGCCGCCCAG ACCTAACCCA CACACCATCG CCCACAATAC 
15 AGATAAAGCC GCAGAACCCC ACCAGTTGTA ACACCAGCGC 

GATCCTGATG GCGAGCCATA GCAGGCATCA GCAAAGCTCC 
TCAATGCCAG TAAGGAACCG CTGTACTGCG CGCTGGCACC 
GTAACCAGGC AATCAGGCTG GCGTAACCGC CGTTAATCAG 
TCCACGCGCG GGGAGTGAAT ACCACGCGAA CCGGAGTGGT 
2 0 CGACCTCGCG GGCGCTTTGC CACCACCAGG CAAAGAGCGC 

CCAGG CGAGT GTTTGATACC AGGTTTCGCT ATGTTGAACT 
ACCAAGCCCA CCGCCGCCCA TCAGAGCCGC GGACCACAGC 
CTGCTGAAAC CGCCGTTTAA TCACCGAAGC ATCACCGCCT 
CCACCAAGCA GTGCGCTGCT AAGCAGCAGC GCACTTTGCG 
25 GCACCGACGG CAATCAGCAA CAGACTGATG GCGAC ACTG C 

TGAAGCCAGC TTCCGGCCAG CGCCAGCCCG CCCATGGTAA 
CCGGACGGGA CGCTCCTGCG CCTGATACAG AACGAATTGC 
TGTCTTCCCG TTTTCCGCCT GAGGTCACTG CGTGGATGGA 
ACGGCGAGCT GCTCACCACC CACTCGAGCT GGATACTTCC 
30 CGGCGATGCT GAAGGTCGCG CGCATTCCCG ATGAAGAGGC 

GGTGGGACGG GCAGGGCGCC GCCCGAGTCT TCGCCTCGGC 
AGCGCGCGTC CGGGGCCGGG GACCTTGCAC AGATAGCGTG 
CTTGCAGGAT CTATGATTCC CTTTGTCAAC AGCAATGGAT 
TCACATTAAG TGGTATTCAA TATTTTCATG AAATGGGAAT 
3 5 CACGTAAAAT CTGTTGTG CG TGTTTAGATT GGAGTGAACG 

ACGTTGGAGC CGCATTATTT TCGCTTTATG AATCTAAAGG 
GTTACCGTGA AGTTACCATC ACGGAAAAAG GTTATGCTGC 
TTTAAGTTGT TTTTCTAATC CGCATATGAT CAATTCAAGG 



ATTGGCGCGG 
CCTTGTCGGG 
AAAGTTGAAT 
ACAAGTTCAG 
GCATCATGCC 
GATCATCAGC 
GATACTACGC 
CTGCATAAAC 
CAGCAAACAG 
CGGCAATTGC 
CAGCATTAAC 
TGCGGCTTGC 
AATCTCAATA 
ACCGAAGTAA 
TGTTGTCTTG 
AACAACGGCA 
AACCAGGGCG 
CCCATCACCA 
GAATGATGCC 
GGTAAAGCTC 
GACGTTCGCT 
CCACCGGCAG 
TTG CAGGCAT 
GCGCTGGCGC 
CGTCCGCCAG 
CGGTTAC CGC 
GGCGGGCGCT 
GTCCGGCCAG 
CACTGAAAAT 
TGACGTTCCT 
C CGTTTCC AT 
GTGGTTAACT 
TTTTAAGACC 
CCGAATAAGA 



GTAGTATCGT 
TTATTCGCGA 
CGTCAGTTTA 
CGATCTACAT 
TCTTTGACCC 
CCAACGACGC 
AGCACGCCAG 
GCCACCAGCT 
AGCGGAAACG 
ATCGGCAGCC 
AGTTTGCGCC 
CCAAGCGTCA 
TAGAAAGCGG 
ACACCCAGCG 
TGGGAAGAGG 
GGCAGCGCCA 
TTATGGCGGC 
GTGGCGTGCG 
GATCCCCACC 
ACGCATCAAT 
GACATGCTGA 
AGCGGTCGAC . 
CTCATGAGTG 
CTGCTGCGCG 
GGGGACATGC 
CTGTTGACCT 
CTGCTCATGG 
GACGACGAGG 
GGTTCAATGA 
TCCAAACATT 
TTAGGTGGGT 
CGACATCTTG 
CACTTTCACA 
AGGCTGGCTC 



5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

600O 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 
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TGCACCTTGG TGATCAAATA 
AGTAGGTGTT TCCCTTTCTT 
TAAAGTAAAA TGCCCCACAG 
TTGGCATAAA AAGGCTAATT 
ACCTAAATGT ACTTTTGCTC 
GTTATTACGT AAAAAATCTT 
CCTATCTAAC ATCTCAATGG 
ATACAATGTA GGCTGCTCTA 
GATTCCGACC TCATTAAGCA 
AGACATCATT AATTCCTAAT 
TCCCTATCAG TGATAGAGAA 
ACGTTACTCG ATGCCATGGG 
GAATTTATTG CTTCGGAAGA 
TTAATGCAGG TTATCTTTGC 
CCAGTGCTGT TGTTGTCATT 
AGTGCGCTTT GGATGCTGTA 
GCTGTCGCGG CATCGGTCAT 
GGTTGGTTAG GGG CAAGTTT 
GCAGGAGAGA TTTCACCGCA 
TTCCTTGTGG TTATGTTTTG 
GAAGTAGGGG TTGAGACGCA 
ATTTTGTTGA TTATTTATTT 
GTGCTATTTA CCGAAAATCG 
GGTCTTGGTC TTTTACACTC 
TGGGGCGAAA AAACGGCAGT 
TTAGCGTTTA TATCTGAAGG 
GGGATCGCTT TACCTGCATT 
GGTGCTTTAC AGGGATTATT 
CTGTTTACTG TTATTTATAA 
GGTTTAGCGT TTTACTGTAT 
GCTCAGGGGA GTAAACAGGA 
CGCGAAATAT AATGACCCTC 
TAGCTTCAAA TAAAACCTAT 
AAATAGCAAT AAATTGGCCT 



ATTCGATAGC 
CTTTAGCGAC 
CGCTGAGTGC 
GATTTTCGAG 
CATCGCGATG 
GCCAGCTTTC 
CTAAGGCGTC 
CACCTAGCTT 
GCTCTAATGC 
TTTTGTTGAC 
AAGTGAAATG 
GATTGGCCTT 
TATCGCTAAC 
TCCTTGGCTT 
AATAGGCGCA 
TTTAGGCCGT 
TGCCGATACC 
TGGGCTTGGT 
TAGTCCCTTT 
GTTCCGTGAA 
ATCGAATTCG 
TTCAGCGCAA 
TTTTGGATGG 
AGTATTCCAA 
ACTGCTCGAA 
TTGGTTAGAT 
ACAGGGAGTG 
GGTGAGCCTT 
TCATTCACTA 
TATTATCCTG 
GACAAGTGCT 
TTGATAACCC 
CTATTTTATT 
TTTTTATCGG 



TTGTCGTAAT 
TTGATGCTCT 
ATATAATGCA 
AGTTTCATAC 
ACTTAGTAAA 
CCCTTCTAAA 
GAGCAAAGCC 
CTGGGCGAGT 
GCTGTTAATC 
ACTCTATCAT 
AATAGTTCGA 
ATCATGCCAG 
CACTTTGGCG 
GGAAAAATGT 
TCG CTGGATT 
TTGCTTTCAG 
ACCTCAG CTT 
TTAATAGCGG 
TTTATCGCTG 
ACCAAAAATA 
GTATACATCA 
TTGATAGGCC 
AATAGCATGA 
GCCTTTGTGG 
TTTATTGCAG 
TTCCCTGTTT 
ATGTCTATCC 
ACCAATGCAA 
CCAATTTGGG 
CTATCGATGA 
TAGTTATTTC 
AAGAGGGCAT 
TATCTTTCAA 
CAAGCTCTTT 



AATGGCGGCA 
TGATCTTCCA 
TTCTCTAGTG 
TGTTTTTCTG 
GCACATCTAA 
GGGCAAAAGT 
CGCTTATTTT 
TTACGGGTTG 
ACTTTACTTT 
TGATAGAGTT 
CAAAGATCGC 
TGTTGCCAAC 
TATTGCTTGC 
CTGACCGATT 
ACTTATTGCT 
GGATCACAGG 
CTCAACGCGT 
GGCCTATTAT 
CGTTGCTAAA 
CACGTGATAA 
CTTTATTTAA 
AAATTCCCGC 
TGGTTGGCTT 
CAGGAAGAAT 
ATAGTAGTGC 
TAATTTTATT 
AAACAAAGAG 
CCGGTGTTAT 
ATGGCTGGAT 
CCTTCATGTT 
GTCACCAAAT 
TTTTTACGAT 
GCTCAATAAA 
TAGGTTTTTC 



PCT/US97/15941 

TACTATCAGT 7080 

ATACGCAACC 7140 

AAAAACCTTG 7200 

TAGGCCGTGT 7260 

AACTTTTAGC 7320 

GAGTATGGTG 7380 

TTACATGCCA 7440 

TTAAACCTTC 7500 

TATCTAATCT 7560 

ATTTTACCAC 7620 

ATTGGTAATT 7680 

GTTATTACGT 7740 

ACTTTATGCG 7800 

TGGTCGGCGC 7860 

GGCTTTTTCA 7920 

AGCTACTGGG 7980 

GAAGTGGTTC 8040 

TGGTGGTTTT 8100 

TATTGTCACT 8160 

TACAGATACC 8220 

AACGATGCCC 8280 

AACGGTGTGG 8340 

TTCATTAGCG 84 00 

AGCCACTAAA 8460 

ATTTGCCTTT 8520 

GGCTGGTGGT 8580 

TCATGAGCAA 8640 

TGGCCCATTA 8700 

TTGGATTATT 8760 

AACCCCTCAA 8820 

GATGTTATTC 8880 

AAAGAAGATT 8940 

AAGCCG CGGT 9000 

GCATGTATTG 9060 
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CGATATG CAT AAACCAGCCA TTGAGTAAGT TTTTAAGCAC ATCACTATCA 
GTTGGTTCTC TTGGATCAAT TTG CTGACAA TGGCGTTTAC CTTACCAGTA 
GGCTAATTTT TTCAAGTTCA TTCCAACCAA TGATAGGCAT CACTTCTTGG 
GGTTTTTATT ATTATCAATA ATATAATCAA GATAATGTTC AAATATACTT 
ACCAACCATT TGTTAAATCA GTTTTTGTTG TGATGTAGGC ATCAATCATA 
GCTTATAACA GGCACTGAGT AATTGTTTTT TATTTTTAAA GTGATGATAA 
TGGTCACCAA CGCTTTTCCC GAGATCCTCT GCGACACCGC CGCTCGTCTG 
GGTCCGGACC GCCGCCCGAT CTCCATCCGC TACAGGAATG GTTCCAGCCG 
TGGCCGCTGA GCACGCGGCA CTTGCGCCCG CCGCCAGCGT AGCGCGCCAA 
CGCCGCGCGA GGTGTGCCCG CTCCACGGCG ACCTGCACCA CGAGAACGTG 
GCGACCGCGG CTGGCTGGCC ATCGACCCGC ACGGACTGCT CGGCGAGCGC 
ATGCCAACAT CTTCACGAAT CCCGATCTCA GCGACCCCGG TCGCCCGCTT 
CGGGCAGGCT GGAGGCTCGA CTCAGCATTG TGGTCGCGAC GACCGGGTTT 
GGCTTCTTCG CTGGATCATT GCATGGACGG GCTTGTCGGC AGCCTGGTTC 
GCGACGGCGA GGGCGAGGGC GCTGCGATTG ATCTGGCCGT AAACGCCATG 
TGCTTGACTA GCGCGGTCAC CGATCTCACC TGGTCGTCGA GCTAGGTCAG 
GCGTGATCCG CTGGAAGTCG TTGCGGGCCA CACCCGCCGC CTCGAAGCCC 
CGGCATCGTG GTGTGCGTGG CCGAGGGACT ATGGAAGGTG CCGGACGATC 
GGGCCGCCGC TATGACGCCC AGCGTCTTGG TGGCGTGACG GTGGAGCTGA 
GCCCATCGAG CGGCAGGCCC GCGTGATCGG TGCCACCTGG CTTGACCAGC 
CGGTGGCTCG GGCTTGGGCG ACCTGGGCTT TAGCAGTGAG GCCAAGTAGG 
GCGCG CGGAC TTCCTGGCCG AACAGGGACT GGCCGAGCGG CGCGGGCAGC 
CACCGGAATC TGCTGGGCAG CAGCGGGCTC GGGAACTGGC GCAGGCCGCG 
CCGCCGATAC CGGCCTGGAG CATCGCCCCG TGGCCGACGG CCAGCGCGTT 
ACCGGCGCCC CGTCATGCTC GCCAGCGGGC GAAATGGGAT GCTTGATGAC 
CCAGCCTCGT GCGGTGGAAG CCCATCGAAC AGCGGCTTGG GGAGCAGCTC 
TGCGCGGTGG CGGCGTGTCT TGGGAGATTG GACGACAGCG TGGGCCGGCC 
GATCAGATCT TGATCCCCTG CGCCATCAGA TCCTTGGCGG CAAGAAAGCC 
CTTTGCAGGG CTTCCCAACC TTCCCAGAGG GCGCCCCAGC TGGCAATTCC 
CTGTCCATAA AACCGCCCAG TCTAGCTATC GCCATGTAAG CCCACTGCAA 
TTCTCTTTGC GCTTGCGTTT TCCCTTGTCC AGATAGCCCA GTAGCTGACA 
GTCAGCACCG TTTCTGCGGA CTGGCTTTCT ACGTGTTCCG CTTCCTTTAG 
GCCCTGAGTG CTTGCGGCAG CGTGAAGCTT TCTCTGAGCT GTAACAGCCT 
AACGAGAGGA TCGAGACCAT CCGCTCCAGA TTATCCGGCT CCTCCATGCG 



PCT/US97/15941 

TAAGCTTTAA 9120 

ATGTATTCAA 9180 

ATAGGG ATAA 9240 

TCTAAGGCAG 9300 

ATTAATTGCT 9360 

AAGGCACCTT 9420 

CACGCGCCGC 9480 

CTTTTCCGGT 9540 

CTTCTGGCGG 9600 

CTCGACTTCG 9660 
ACCTTCGACT 9720 
GCGATCCTGC 9780 
GAGCCCGAAC 9840 
ATCGGCGACG 9900 
GCACGCCGGT 9960 

GCCGTGTCGG 10020 

TGCACCAGGC 10080 

TGCCCGAGCA 10140 

AATCGCACCT 10200 

AGTTGATCGA 10260 

CGATACAGCA 10320 

GCGTGATCCT 10380 

AAGGACATTG 10440 
GCCGGCGTCT 10500 
GCCAAGGGGT 10560 
GCCGCGACGG 10620 
CCTGTCTCTT 10680 
ATCCAGTTTA 10740 
GGTTCGCTTG 10800 
GCTACCTGCT 10860 
TTCATCCGGG 10920 
CAGCCCTTGC 10980 
GACCGCAACA 11040 
TTGCCTCTCG 11100 
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GCTCCTGCTC CGGTTTTCCA TGCCTTATGG AACTCCTCGA TCCGCCAGCG ATGGGTATAA 11160 

ATGTCGATGA CGCGCAAGGC TTGGGCTAGC GACTCGACCG GTTCGCCGGT CAGCAACAAC 11220 

CATTTCAACG GGGTCTCACC CTTGGGCGGG TTAATCTCCT CGGCCAGCAC CGCGTTGAGC 11280 

GTGATATTCC CCTGTTTTAG CGTGATGCGC CCACTGCGCA GGCTCAAGCT CGCCTTGCGG 11340 

GCTGGTCGAT TTTTACGTTT ACCGCGTTTA TCCACCACGC CCTTTTGCGG AATGCTGATC 11400 

TGATAGCCAC CCAACTCCGG TTGGTTCTTC AGATGGTCGA TCAGATACAA CCCAGACTCT 11460 

ACGTCCTTGC GTGGGTGCTT GGAGCGCACC ACGAAGCGCT CGTTATGCGC CAGCCTGTCC 11520 

TGCAGATAAG CATGAATATC GGCTTCGCGG TCACAGACCG CAATCACGTT GCTCATCATG 11580 

CTGCCCATGC GTAACCGGCT AGTTGCGGCC GCTGCCAGCC ATTTGCCACT CTCCTTTTCA 11640 

TCCGCATCGG CAGGGTCATC CGGGCGCATC CACCACTCCT GATGCAGTAA TCCTACGGTG 11700 

CGGAATGTGG TGGCCTCGAG CAAGAGAACG GAGTGAACCC ACCATCCGCG GGATTTATCC 11760 

TGAATAGAGC CCAGCTTGCC AAGCTCTTCG GCGACCTGGT GGCGATAACT CAAAGAGGTG 11820 

GTGTCCTCAA TGGCCAGCAG TTCGGGAAAC TCCTGAGCCA ACTTGACTGT TTGCATGGCG 11880 

CCAGCCTTTC TGATCGCCTC GGCAGAAACG TTGGGATTGC GGATAAATCG GTAAGCGCCT 11940 

TCCTGCATGG CTTCACTACC CTCTGATGAG ATGGTTATTG ATTTACCAGA ATATTTTGCC 12000 

AATTGGGCGG CGACGTTAAC CAAGCGGGCA GTACGGCGAG GATCACCCAG CGCCGCCGAA 12060 

GAGAACACAG ATTTAGCCCA GTCGGCCGCA CGATGAAGAG CAGAAGTTAT CATGAACGTT 12120 

ACCATGTTAG GAGGTCACAT GGAAGATCAG ATCCTGGAAA ACGGGAAAGG TTCCGTTCGA 12180 

ATTG CATGCG GATCCGGGAT CAAGATCTGA TCAAGAGACA GGTACCAATT GTTGAAGACG 12240 

AAAGGGCCTC GTGATACGCC TATTTTTATA GGTTAATGTC ATGATAATAA TGGTTTCTTA 12300 

GACGTCAGGT GGCACTTTTC GGGGAAATGT GCGCGGAACC CCTATTTGTT TATTTTTCTA 12360 

AATACATTCA AATATGTATC CGCTCATGAG ACAATAACCC TGATAAATGC TTCAATAATA 12420 

TTGAAAAAGG AAGAGTATGA GTATTCAACA TTTCCGTGTC GCCCTTATTC CCTTTTTTGC 12480 

GGCATTTTGC CTTCCTGTTT TTGCTCACCC AGAAACGCTG GTGAAAGTAA AAGATGCTGA 12540 

AGATCAGTTG GGTGCACGAG TGGGTTACAT CGAACTGGAT CTCAACAGCG GTAAGATCCT 12600 

TGAGAGTTTT CGCCCCGAAG AACGTTTTCC AATGATGAGC ACTTTTAAAG TTCTGCTATG 12660 

TGGCGCGGTA TTATCCCGTG TTGACGCCGG GCAAGAGCAA CTCGGTCGCC GCATACACTA 12720 

TTCTCAGAAT GACTTGGTTG AGTACTTGGC AAACTGATCT AAATGTTTAG CCCAGTCATC 12780 

ATACTTCACC GATGCCAACG CATTAAAAAT AGCATCACGA TCGGCTTTGC TGAATTTCTT 12840 

ATTTAAAACA TCCTTGTATT TTTCAAAAGC AGCGAGAGCT TCATTCACAT TGCCGATTTT 12900 

CTTACCTTTA GACTTATCAG CAAGTTCCTG TGCCATTTTC GAATATTTTT CACCATATTT 12960 

TTCAGTCAGC GTTTGATAAA AGCTAACTGT TGCATCAACA GCATCCTTAA TCTGTGAATT 13020 

AAGGAGATTA TTCTGTGCTT TTTTCAAATT TTCTTCAGCT TCATGAACAC GAGCGATACC 13080 

GGCATTACGA TTATTACTGA CCTGAGAAAT AGCCTTCTGG ATCTGAGTTA TATCAGCATT 13140 
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TATCCGGTTA ATACGTGTTT CTGATGCTGT TACCTGTTTT TGTTTTTCTT CTCTAATCTT 
ACCGGCCCCA ACCCGTCGTC TGGTTGCTTC AAAAAAAGGA CGGTTCTGAA GCGGATCATT 
GGCTCTTGGT GATAGTTTTT TGACCAGCTC ATCCAGTTCT TTATATTTAG CGGATGCCTG 
AGCCAGTTCA TTTCGTTTTC CAGCGAGCGT TTTCATTTCT GCATCACGGG CATGGATACT 
GGAGCTTAAA CGAGAATTGA GAGTCTTAAT CTCTCCATCC ATTTTCACCA CTTCAGATTG 
TGCAGCAGAA AGTTTTTTTT GGGCGATCTC AACAGCTTTA GCTTCTTCAC TCAATGCAGC 
CAGTCGTTTC TCTTCAGCTT CAGCCAGTTT CAACTGGCGT TCTGTTTCAG CCTTCTCCCG 
TTCAATCTCT TTACGTCGTT GTTCTGCTTC CTGAAAAGCC TTTTCTGCTG CTTCCGCTTC 
TTTACGGGCT TTTTCTTCTG CTTTCGCAAG GCGCAAACGC TCTGCTTCCG CCTGCATAGC 
TGCATTATTA GCATGAGCAA GCTCTGTTGC TGAAGGCGTA CGTGAGGCAT TGTGACGAAG 
AGCCTCATTC ACGATATCCT TCAGGCGCTG AGTCAGCGCA TCCCTGTTTG CCTTTGCTTT 
CGCCTGTGCT TCCGCTGCAG CTTTTGCCCG GGCAGCCTGC TCTGCCTGTG TTTTCTTTAA 
TTGAGCAGTA GACCATTTAG CAGTTGCATG AATAGCTGCA GAACTTTCAC TTTTACTGCC 
TCCTTTTCCA CCTCCGCCGC CAGAGCCACT CCCGTCAGGA GTACCATTCA AAAGAGTAAT 
AATTACCTGT CCCTTATCAT CATAAGGAAC ACCATCTTTA TAGTACGCTA CCGCGGTTTC 
CATTATAAAA TCCTCTTTGA CTTTTAAAAC AATAAGTTAA AAATAAATAC TGTACATATA 
ACCACTGGTT TTATATACAG CATAAAAGCT ACGCCGCTGC ATTTTCCCTG TCAAGACTGT 
GGACTTCCAT TTTTGTGAAA ACGATCAAAA AAACAGTCTT TCACACCACG CGCTATTCTC 
GCCCGATGCC ACAAAAACCA GCACAAACAT TACCGTTCTC AGACCTCATT ATGTTTTACT 
GAAACTATGA GATGAGACAT CTATGGGACA CTGTCACTTT ATGGCATGGC ACACACTCCG 
GGACGCACTA AAAATGACAG GCAGATCGCG TTCACAGTTT TACCGTGATA TGCGCGGAGG 
CCTTGTCAGT TACCGTACCG GCAGGGACGG ACGACGGGAG TTTGAAACCA GTGAACTGAT 
CCGGGCATAC GGCGAATTAA AGCAGAATGA GACACCAGAA AGGCACAGTG AGGGACATGC 
AGAAAATCCA CATGATCAGC AGACAGAACG CATTCTCCGG GAACTGAATG AGCTGAAACA 
ATGCCTGACG CTGATGCTTG AGGATAAACA GGCACAGGAT ATGGATCGCA GACGCCAGGA 
AGCAGAACGG GAACAGCTAC AAAATGAGAT AGCCCAGCTC AGGCAGGCAC TGGAACTGGA 
AAAGAAACGG GGATTCTGGT CCAGGTTGTT CGGTCGCTGA ACGCTGTCAG AGACTGATGA 
TAAAATAGTC TTCGGATAAT AACTCACCGA GAATAAATAC TTTAAGGTAG GGAGAGACTC 
ATGAGACGTA CCGGAAACAA ACTTTGTCTT ATCGCCATGA TAACAGCAAC AGTAGCTCTC 
ACAGCCTGTA CCCCAAAGGG CAGCGTGGAA CAACATACCC GGCATTACGT ATATGCTTCT 
GATGACGGTT TTGATCCCAA CTTTTCCACC CAAAAAGCCG ACACAACACG AATGATGGTG 
CCTTTTTTTC GGCAGTTCTG GGATATGGGA GCTAAAGACA AAGCGACAGG AAAATCACGG 
AGTGATGTGC AACAACGCAT TCAGCAGTTT CACAGCCAAG AATTTTTAAA CTCACTCCGG 
GGCACAACTC AATTTGCGGG TACTGATTAC CGCAGCAAAG ACCTTACCCC GAAAAAATCC 
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AGGCTGCTGG CTGACACGAT TTCTGCGGTT TATCTCGATG GCTACGAGGG CAGACAGTAA 
GTGGATTTAC CATAATCCCT TAATTGTACG CACCGCTAAA ACGCGTTCAG CGCGATCACG 
GCAGCAGACA GGTAAAAATG GCAACAAACC ACCCTAAAAA CTGCGCGATC GCGCCTGATA 
AATTTTAACC GTATGAATAC CTATGCAACC AGAGGGTACA GGCCACATTA CCCCCACTTA 
ATCCACTGAA GCTGCCATTT TTCATGGTTT CACCATCCCA GCGAAGGGCC ATCCAGCGTG 
CGTTCCTGTA TTTCCGGCTG ACGCTCCCGT TCTAGGGATA ACACATGTTC GCGCTCCTGT 
ATCAGCCGTT CCTCTCTTAT CTCCAGTTCT CGCTGTATAA CTGGCTCAAG CGTTCTGTCT 
GCTCGCTCAA GTGTTGCACC TGCTGACTCA ACTGCATGAC CCGCTCGTTC AGCATCGCGT 
TGTCCCGTTG CGTAAGCGAA AACATCTTCT GCAATTCCAC GAAGGCGCTC TCCCATTCGC 
TCAGCCGCTG CATATAGTCC TGTTGCAGCT GCTCTAAGGC GTTCAGCAAA TGTGTTTCCA 
GCTCTGTCAC TCTGTGTCAC TCCTTCAGAT GTACCCACTC TTTCCCCTGA AAGGGAATCA 
CCTCCGCTGA TTTCCCGTAC GGAAGGACAA GGAATTTCCT GTTCCCGTCC TGCACAAACT 
CCACGCCCCA TGTCTTCGCG TTCAGTTTCT GCAATGTCTC TTCCTGCTTC CTGATTTCTT 
CCAGGTTCGC CTGTATCCTC CCTCCAAGAT ACCAGAGCGT CCCGCCACTC GCGGTAAACA 
GGAGAAAGAC TATCCCCAGT AACATCATGC CCGTATTCCC TGCCAGCTTT AACACGTCCC 
TCCTGTGCTG CATCATCGCC TCTTTCACCC CTTCCCGGTG TTTTTCCAGC GATTCCTCTG 
TCGAGGCTGT GAACAGGGCT ATAGCGTCTC TGATTTTCGT CTCGTTTGAT GTCACAGCCT 
CGCTTACAGA TTCGCCGAGC CTCCTGAACT CGTTGTTCAG CATTTTCTCT GTAGATTCGG 
CTCTCTCTTT CAGCTTTTTC TCGAACTCCG CGCCCGTCTG CAAAAGATTG CTCATAAAAT 
GCTCCTTTCA GCCTGATATT CTTCCCGCCG TTCGGATCTG CAATGCTGAT ACTGCTTCGC 
GTCACCCTGA CCACTTCCAG CCCCGCCTCA GTGAGCGCCT GAATCACATC CTGACGGCCT 
TTTATCTCTC CGGCATGGTA AAGTGCATCT ATACCTCGCG TGACGCCCTC AGCAAGCGCC 
TGTTTCGTTT C AGG CAGGTT ATCAGGGAGT GTCAGCGTCC TGCGGTTCTC CGGGGCGTTC 
GGGTCATGCA GCCCGTAATG GTGATTTAAC AGCGTCTGCC AAGCATCAAT TCTAGGCCTG 
TCTGCGCGGT CGTAGTACGG CTGGAGGCGT TTTCCGGTCT GTAGCTCCAT GTTCGGAATG 
ACAAAATTCA GCTCAAGCCG TCCCTTGTCC TGGTGCTCCA CCCACAGGAT GCTGTACTGA 
TTTTTTTCGA GACCGGGCAT CAGTACACGC TCAAAGCTCG CCATCACTTT TTCACGTCCT 
CCCGGCGGCA GCTCCTTCTC CGCGAACGAC AGAACACCGG ACGTGTATTT CTTCGCAAAT 
GGCGTGGCAT CGATGAGTTC CCGGACTTCT TCCGGTATAC CCTGAAGCAC CGTTGCGCCT 
TCGCGGTTAC GCTCCCTCCC CAGCAGGTAA TCAACCGGAC CACTGCCACC ACCTTTTCCC 
CTGGCATGAA ATTTAACTAT CATCCCGCGC CCCCTGTTCC CTGACAGCCA GACGCAGCCG 
GCGCAGCTCA TCCCCGATGG CCATCAGTGC GGCCACCACC TGAACCCGGT CACCGGAAGA 
CCACTGCCCG CTGTTCACCT TACGGGCTGT CTGATTCAGG TTATTTCCGA TGGCGGCCAG 
CTGACGCAGT AACGGCGGTG CCAGTGTCGG CAGTTTTCCG GAACGGGCAA CCGGCTCCCC 
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CAGGCAGACC CGCCGCATCC ATACCGCCAG TTGTTTACCC TCACAGCGTT CAAGTAACCG 
GGCATGTTCA TCATCAGTAA CCCGTATTGT GAGCATCCTC TCGCGTTTCA TCGGTATCAT 
TACCCCATGA ACAGAAATCC CCCTTACACG GAGGCATCAG TGACTAAACA GGAAAAAACC 
GCCCTTAACA TGGCCCGCTT TATCAGAAGC CAGACATTAA CGCTGCTGGA GAAGCTCAAC 
GAACTGGACG CAGATGAACA GGCCGATATT TGTGAATCGC TTCACGACCA CGCCGATGAG 
CTTTACCGCA GCTGCCTCGC ACGTTTCGGG GATGACGGTG AAAACCTCTG ACACATGCAG 
CTCCCGGAGA CGGTCACAGC TTGTCTGTGA GCGGATGCCG GGAGCTGACA AGCCCGTCAG 
GGCGCGTCAG CAGGTTTTAG CGGGTGTCGG GGCGCAGCCC TGACCCAGTC ACGTAGCGAT 
AGCGGAGTGT ATACTGGCTT AACCATGCGG CATCAGTGCG GATTGTATGA AAAGTACGCC 
ATGCCGGGTG TGAAATGCCG CACAGATGCG TAAGGAGAAA ATGCACGTCC AGGCGCTTTT 
CCGCTTCCTC GCTCACTGAC TCGCTACGCT CGGTCGTTCG ACTGCGGCGA GCGGTACTGA 
CTCACACAAA AACGGTAACA CAGTTATCCA CAGAATCAGG GGATAAGGCC GGAAAGAACA 
TGTGAGCAAA AGACCAGGAA CAGGAAGAAG GCCACGTAGC AGGCGTTTTT CCATAGGCTC 
CGCCCCCCTG ACGAGCATCA CAAAAATAGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA 
GGACTATAAA GCTACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC TCCTGTTCCG 
ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT CGGGAAGCGT GGCG CTTTCT 
CATAGCTCAC GCTGTTGGTA TCTCAGTTCG GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT 
GTGCACGAAC CCCCCGTTCA GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG 
TCCAACCCGG TAAGGCACGC CTTAACGCCA CTGGCAGCAG CCACTGGTAA CCGGATTAGC 
AGAGCGATGA TGGCACAAAC GGTGCTACAG AGTTCTTGAA GTAGTGGCCC GACTACGGCT 
ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC TTCGGAAAAA 
GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGTTGG TAGCGGTGGT TTTTTTGTTT 
GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTA ATCTTTTCTA 
CTGAAC CGCG ATCCCCGTCA GTTTAGAAGA GGAGGATGGT GCGATGGTCC CTCCCTGAAC 
ATCAGGTATA TAGTTAGCCT GACATCCAAC AAGGAGGTTT ATCGCGAATA TTCCCACAAA 
AAATCTTTTC CTCATAACTC GATCCTTATA AAATGAAAAG AATATATGGC GAGGTTTAAT 
TTATGAGCTT AAGATACTAC ATAAAAAATA TTTTATTTGG CCTGTACTGC ACACTTATAT 
ATATATACCT TATAACAAAA AACAGCGAAG GGTATTATTT CCTTGTGTCA GATAAGATGC 
TATATGCAAT AGTGATAAGC ACTATTCTAT GTCCATATTC AAAATATGCT ATTGAATACA 
TAGCTTTTAA CTTCATAAAG AAAGATTTTT TCGAAAGAAG AAAAAACCTA AATAACGCCC 
CCGTAGCAAA ATTAAACCTA TTTATGCTAT ATAATCTACT TTGTTTGGTC CTAGCAATCC 
CATTTGGATT GCTAGGACTT TTTATATCAA TAAAGAATAA TTAAATCCCT AACACCTCAT 
TTATAGTATT AAGTTTATTC TTATCAATAT AGGAGCATAG AA 
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1. A system for transposing a transposable DNA sequence 
in vitro, the system comprising: 

a Tn5 transposase modified relative to a wild type Tn5 
transposase, the modified transposase comprising a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to TnS outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 
multimeric form; 

a donor DNA molecule comprising the transposable DNA 
sequence, the DNA sequence being flanked at its 5«- and 3 • -ends 
by the Tn5 outside end repeat sequences; and 

a target DNA molecule into which the transposable element 
can transpose. 

2. A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to bind with greater avidity is 
characterized as a substitution mutation at position 54 of the 
wild type transposase. 

3. A system as claimed in Claim 2 wherein position 54 is 
a lysine. 

4. A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to be less likely to assume an 
inactive multimeric form is characterized as a substitution 
mutation at position 372 of the wild type transposase. 



5. A system as claimed in Claim 4 wherein position 372 
is a proline. 

6. A system as claimed in Claim 1 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 
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7. A system as claimed in Claim 6 wherein position 56 is 
an alanine- 

8 A system as claimed in Claim 1 wherein the donor DNA 
xnolecule is flanked at its 5'- and 3 ■ -ends by an 18 or 19 base 
pair flanking DNA sequence comprising nucleotide A at position 
10, nucleotide T at position 11, and nucleotide A at position 
12. 

9 The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 4 selected 
from the group consisting of A or T. 

10 The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 15 selected . 
from the group consisting of G or C. 

11 The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 17 selected 
from the group consisting of A or T. 

12 The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 18 selected 
from the group consisting of G or C. 

13 The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 ' -CTGTCTCTTATACACATCT-3 ' 

14 The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 ' -CTGTCTCTTATACAGATCT-3 1 
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15. A Tn5 transposase modified relative to a wild type 
TnS transposase, the modified transposase comprising: 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to bind to Tn5 outside end 
repeat sequences of a donor DNA with greater avidity than the 
wild type Tn5 transposase; and 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to be less likely than the wild 
type transposase to assume an inactive multimeric form. 

16. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to bind 
with greater avidity is characterized as a substitution 
mutation at position 54 of the wild type transposase. 

17. a modified Tn5 transposase as claimed in Claim 16 
wherein position 54 is a lysine. 

18. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to be 
less likely to assume an inactive multimeric form is 
characterized as a substitution mutation at position 372 of the 
wild type transposase. 



19. A modified Tn5 transposase as claimed in Claim 18 
wherein position 372 is a proline. 

20. A modified Tn5 transposase as claimed in Claim 15 
further comprising a substitution mutation at position 56 of 
the wild type transposase. 

21. A modified Tn5 transposase as claimed in Claim 20 
wherein position 56 is alanine. 

22. A genetic construct comprising a nucleotide sequence 
that can encode a Tn5 transposase that both has greater avidity 
for Tn5 outside end repeats and is less likely to assume an 
inactive multimeric form than a wild type TnS transposase. 
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23 A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase. 



24 . 



A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a proline residue at amino 
acid 372 of the transposase. 

25 A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase and a proline residue at amino acid 
372 of the transposase* 

26. A genetic construct as claimed in Claim 22 comprising 
the nucleotide sequence of SEQ ID NO:l. 

27 A genetic construct comprising: 

a transposable DNA sequence flanked at its 5 ' and 3 • ends 
by an 18 or 19 base pair flanking DNA sequence comprising 
nucleotide A at position 10, nucleotide T at position 11, and 
nucleotide A at position 12. 

28 The construct of Claim 27 further comprising, at 
position 4 of the flanking sequence, a nucleotide selected from 
the group consisting of T or A. 

29 The construct of Claim 27 further comprising, at 
position 15 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 

30 The construct of Claim 27 further comprising, at 
position 17 of the flanking sequence, a nucleotide selected 
from the group consisting of T or A. 

31 The construct of Claim 27 further comprising, at 
position 18 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 
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32. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 1 -CTGTCTCTTATACACATCT-3 » 

33. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 • -CTGTCTCTTATACAGATCT-3 • . 

34. A method for in vitro transposition, the method 
comprising the steps of: 

combining a donor DNA molecule that comprises a 
transposable DNA sequence of interest, the DNA sequence of 
interest being flanked at its 5'- and 3 ' -ends by Tn5 outside 
end repeat sequences, with a target DNA molecule and a Tn5 
transposase modified relative to wild type Tn5 transposase in a 
suitable reaction buffer at a temperature below a physiological 
temperature until the modified transposase binds to the outside 
end repeat sequences; and 

raising the temperature to a physiological temperature for 
a period of time sufficient for the enzyme to catalyze in vitro 
transposition, 

wherein the modified transposase comprises a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to the Tn5 outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 
multimeric form. 

35. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to bind with greater 
avidity is characterized as a substitution mutation at position 
54 of the wild type transposase. 

36. A method as claimed in Claim 35 wherein position 54 
is a lysine . 
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37. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to be less likely to 
assume an inactive multimeric form is characterized as a 
substitution mutation at position 372 of the wild type 
transposase . 

38. A method as claimed in Claim 37 wherein position 372 
is a proline. 

39. A method as claimed in Claim 34 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 

40. A method as claimed in Claim 39 wherein position 56 
is an alanine. 

41 A method as claimed in Claim 34 wherein the DNA 
sequence of interest is flanked at its 5'- and 3 '-ends by an 18 
or 19 base pair flanking DNA sequence comprising nucleotide A 
at position 10, nucleotide T at position 11, and nucleotide A 
at position 12. 

42 The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 4 
selected from the group consisting of A or T. 

43. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 15 
selected from the group consisting of G or C. 

44. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 17 
selected from the group consisting of A or T. 

45 The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 18 
selected from the group consisting of G or C. 
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46. The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 ' -CTGTCTCTTATACACATCT-3 • . 

47. The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 1 -CTGTCTCTTATACAGATCT-3 1 . 
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